LessDoom: Response to AI 2027

Apr 4

Responding to a great near future vision

4 Comments

Very interesting safety idea, and probably easy to encouragingly "prove" in the sense of the toy safety experiments that have become popular lately, I assume someone will do this soon (unless it has happened already and I missed it?)

Expand full comment

Michael Dickens

Apr 4

> We don’t want American labs to slow down because it drastically increases the likelihood that Chinese labs win.

There are a number of reasons to believe this is false.

1. The nature of a *race* is that when you speed up, it incentivizes the other party to speed up. When you slow down, it gives the other party room to slow down.

2. It assumes the US and China can't cooperate (by signing a non-proliferation treaty, etc.). I think working toward US-China cooperation is one of the best things to do (although I'm not sure how to do it).

3. It assumes China *wants* to race. From statements I've seen, Chinese leaders seem more concerned about the dangers of AI and less interested in racing. For example, Xi Jinping has commented on AI x-risk; no top US politician has commented on x-risk AFAIK.

Even if it's true, how much does it matter? An unaligned US-originating AI kills everyone. An unaligned China-originating AI kills everyone. It only matters in the case where the AI is aligned with the creator. And it only matters if China would do more harm with a superintelligent AI than the US would. I'm about 50/50 on that question, and I think people are way too quick to assume that it's better for the US to control AI. For example, the US is much more imperalist than China so it might be more likely to try to take over the world. Another thing everyone seems to forget is that if AGI arrives before 2029, that means the Trump Administration will be in power at the time. How confident are you that a Trump-controlled AI is better than a China-controlled one?

Expand full comment

Patrick Ruff

Apr 5

I live in China, it's not so bad

Expand full comment

Vosmyorka

Apr 4

Yeah, basically my greatest qualm with that document is the likelihood that Agent-4 (which is meant to be a swarm of same-gen AIs) would not actually be likely to behave in a way more cohesive than a country or corporation -- it is likely to have some degree of internal dissent (even biological systems which have had the advantage of billions of years of evolution develop cancers, after all; Agent-4 would be the first organization of its kind). The problem is that identifying *what* the internal divides might be is very hard -- it relies on an understanding of how different same-gen AIs might be trained (difficult to foresee without superhuman expertise in prompt engineering) and an understanding of what different analyses of the Spec might seem compelling to AIs (and if you understood this, you would be pretty far along the path to solving alignment).

Also, another thought: one of my takeaways from the document is that, by carefully trying to learn whatever the AIs are learning, even in some simplified form, I could become very rich. (In fact, if I could identify the human employees at OpenBrain who are best at understanding the AIs, and blindly copy their investments, I could become very rich). Might not this thought also occur to...Agent-3? Just as basically all biological systems are to some degree parasitized, one could imagine simpler AIs trained to oversee more complicated AIs start "parasitizing" the more complicated AIs -- appropriating resources generated by the more complicated AIs towards oversight (or whatever other goal a less-aligned version of Agent-3 might have, which differs from Agent-4's goals). The money to regulate corporations is ultimately generated from taxes raised from...the corporations wishing to avoid regulation.

Anyway, yeah, God Hates Singletons. A swarm like Agent-4 probably would not be internally aligned and would probably attract parasites (and studying those parasites would be an excellent way to learn its true motivations). The belief that it might be comes from a classic bias in human thinking, wherein people *always* assume their enemies are more internally united than they think. (I remember, in 2016, reading the ISIS publication Dabiq, and being amused that they seemed to believe American neoconservatives and Iranian mullahs were coordinating and in a secret alliance -- their catch-all insult, safawi, was a reference to the early modern Safavid dynasty in Iran, which they thought was more-or-less the root of all evil, and they were totally comfortable applying this insult to *John McCain*). I respect Kokotajlo a great deal, but I think he's making the same sort of mistake here, where because it is difficult to imagine what the internal controversies in Agent-4 (or 5, or n, or whatever) would be about, he assumes they wouldn't exist. I think they would.

Expand full comment

How the Hell

LessDoom: Response to AI 2027