Yesterday,
and his team released an important vision of the near future, AI 2027. This post is a direct response to that vision, so if you haven’t read it - pause here and read AI 2027 - it’s worth it.I broadly agree with the timelines presented in that document together with the uncertainty ranges repeatedly mentioned by the authors. Even today, there’s a risk that trade wars meaningfully disrupt semiconductor manufacturing and slow down AI all by themselves. But even with all the chaos of the real world - it is a very plausible future that we get human-level AI within a couple of years, and superhuman AI shortly after that.
However, there are two points in AI 2027 that I think are worth examining critically. The first one is the two endings. Even though the authors put in disclaimers about how they don’t recommend the slowdown ending, and how there are multiple ways things can go - if you write a document that has two roads:
Do things that we broadly think are right and we’re going to have abundance and democracy everywhere.
Do things that we broadly think are wrong and all humans die.
Most people are going to ignore the disclaimers and view this as a serious policy endorsement. To that effect, the main cost of the slowdown decision is completely ignored in the endings. We don’t want American labs to slow down because it drastically increases the likelihood that Chinese labs win. That scenario is completely omitted from the endings. If you’re serious about thinking about a unilateral American slowdown - write out the scenario where Chinese labs win and how that plays out for us.
The other big issue I have with AI 2027 is the mechanics of the malevolent AI takeover. Here’s a very short summary of what happens:
An AI is slightly misaligned.
All AI development work is taken over by lots of copies of this AI that share a memory and a goal - and all they do is work on the next version.
They build a seriously misaligned AI that kills all humans.
This seems plausible enough but it has a few assumptions embedded in it that are worth challenging:
All AIs are identical.
All AIs share a memory space that is illegible to humans.
All latest-gen AIs work on one problem and nothing else.
Nobody else has access to same-gen AI.
The first point is critical. In human societies the greatest atrocities are committed when everyone’s political views are aligned and there’s no diversity of goals or visions. It’s likely that the same risk exists with AI. To that end, it would behoove us to build different versions of AIs so no single vision runs away with the future. And it’s already happening! We have AI model instances with different system prompts1. We have AI models that are trained to code, and AIs that are trained in creative writing. It’s very likely that by the time we have hundreds of thousands of AI agents working together, there will be different types (that are of the same generation/capabilities). A lot of them are likely to be trained to look for deception, report to humans on the activities of other AIs, etc. In that case, a shared memory space will make the misaligned AI’s plans transparent to humans, and humans can make necessary adjustments.
Just as biodiversity makes ecosystems resilient, diversity of AI agents—with different architectures, goals, system prompts, and training regimes—creates checks and balances. Intentional diversity makes runaway scenarios less likely, as agents could monitor each other, detect deviations, and act as whistleblowers to human overseers.
The last point is also important - right now, we have 4 labs with competitive models on roughly the same level: OpenAI, Anthropic, DeepMind, xAI. Let’s say, that there’s another Chinese superlab in the future - so we get 5 total. If a misaligned AI attempts to radically change human society to fit its needs, it’d need to either do this without all the other AIs noticing, or convince them all to help, which would require them to be misaligned in the same way.
There are more details in AI 2027 to think and talk about - I highly recommend both reading the document and listening to Daniel and Scott on
’s podcast. I look forward to seeing what else the AI Futures Project produces.The authors would likely point out that in their scenario, the AI would ignore the system prompt. That depends a lot on the training process and the rewards! Not guaranteed either way.
Very interesting safety idea, and probably easy to encouragingly "prove" in the sense of the toy safety experiments that have become popular lately, I assume someone will do this soon (unless it has happened already and I missed it?)
> We don’t want American labs to slow down because it drastically increases the likelihood that Chinese labs win.
There are a number of reasons to believe this is false.
1. The nature of a *race* is that when you speed up, it incentivizes the other party to speed up. When you slow down, it gives the other party room to slow down.
2. It assumes the US and China can't cooperate (by signing a non-proliferation treaty, etc.). I think working toward US-China cooperation is one of the best things to do (although I'm not sure how to do it).
3. It assumes China *wants* to race. From statements I've seen, Chinese leaders seem more concerned about the dangers of AI and less interested in racing. For example, Xi Jinping has commented on AI x-risk; no top US politician has commented on x-risk AFAIK.
Even if it's true, how much does it matter? An unaligned US-originating AI kills everyone. An unaligned China-originating AI kills everyone. It only matters in the case where the AI is aligned with the creator. And it only matters if China would do more harm with a superintelligent AI than the US would. I'm about 50/50 on that question, and I think people are way too quick to assume that it's better for the US to control AI. For example, the US is much more imperalist than China so it might be more likely to try to take over the world. Another thing everyone seems to forget is that if AGI arrives before 2029, that means the Trump Administration will be in power at the time. How confident are you that a Trump-controlled AI is better than a China-controlled one?