ChatGPT captured the public imagination as no other AI product has to date.
On top of that, Microsoft is now investing another $10B into OpenAI, further cementing all of the attention on the company.
Meanwhile, the first serious banger of an AI paper of 2023 has gone largely unnoticed. DeepMind (owned by Google)1 has trained an agent which can learn how to perform a hitherto unseen task in just 1-2 tries. Moreover, the technology they use has the potential to be used to create a foundation model for general-purpose robots2. Here is a video showing in real-time how the agent learns to solve new problems:
That got me thinking about the relative positions of Google and OpenAI in the race to AGI. Let’s take a look at where the two companies stand in their R&D progress, with the caveat that we can only discuss published work; we have no idea how internal unpublished work compares between the two groups.
Language
The recent upswing in language-related AI got started in 2017 when a largely Google-affiliated team published Attention Is All You Need, a paper that introduced the transformers architecture that powers virtually all of the work in the field today. The upswing REALLY got into high gear when OpenAI showed the world that if you make a transformer model BIG ENOUGH, it will get a lot of emergent functionality, aka can just do things you ask it to do. That was 2020, the transformer model was GPT-3. Let’s look at the impressive language models trained since then3.
LaMDA (Google, 2021-2022). A model comparable in size to GPT-3 but trained on dialogue. Famously convinced a Google engineer that it was sentient. Memes aside, best conversational AI for its time.
Codex (OpenAI, July 2021). A GPT model fine-tuned to write code.
GLaM (Google, December 2021). A model interesting because of its architecture (spares, mixture of experts), beats GPT-3 on 80/90% of zero-/one-shot tasks.
Gopher (Google, December 2021). A model that’s significantly bigger than GPT-3 that halved the gap between GPT-3 and human-level performance in MMLU tasks.
WebGPT (OpenAI, December 2021). A model that uses web search to augment its results. Most of the time answers questions better than redditors.
InstructGPT (OpenAI, January-March 2022). A version of GPT-3 that is much better at following instructions.
Chinchilla (Google, March 2022). The importance of this paper is that the team behind it has figured out how to optimally use compute for training Large Language Models. It used the same compute budget as Gopher, but significantly outperformed its rodent cousin.
PaLM (Google, April 2022). The most powerful base large language model to date. Not only is it huge (3 times the size of GPT-3), but it also uses the Chain of Thought technique, which allows it to perform better on tasks like math problems, which require correct answers. It also has a version finetuned on code, which is likely SOTA for code generation.
Minerva (Google, June 2022). A model specifically designed for math/science problems, which significantly improved on SOTA in those domains.
FIM (OpenAI, July 2022). A paper that shows that you can add a fill-in-the-middle capability to Large Language Models without compromising on left-to-right prediction accuracy.
Sparrow (Google, September 2022). A chatbot that can use Google search to find information to add to its responses to make them more correct.
U-PaLM (Google, October 2022). Shows how you can use a technique pioneered by Google in a prior paper to create a model with PaLM-like capabilities at around half the cost.
Flan-PaLM (Google, October 2022). Shows that finetuning a model on a variety of tasks and specifically on chain-of-thought tasks greatly increases its performance.
ChatGPT (OpenAI, November 2022). Chatbot that people can actually talk to.
Med-PaLM (Google, December 2022). PaLM, finetuned on medical data. Approaches human-level performance on various medical tests.
Winner: It’s very close, and at different times both companies have held the lead, but I’d say overall Google has published a larger amount impressive papers, and, as far as the latest published work goes, appears to be ahead of OpenAI.
Image Generation
In image generation, there are 3 works of note. DALL-E 2 (OpenAI) is widely available, you can try it today, here is an astronaut on a horse from the announcement:
Imagen was Google’s response. Supposedly, it ouperforms DALL-E 2. Here’s a raccoon on a skateboard:
Parti is another Google model that can understand language better, so it’s better at combining multiple objects.
Winner: Again, it’s quite close, but Google’s models can spell, and OpenAIs cannot. Google.
Games
OpenAI has only one significant accomplishment in games - Open AI Five. It beat the world champions in (a somewhat modified version of) Dota 2, one of the most competitive video games on the planet4. It's a major achievement.
Google5 has a similar achievement under its belt - it beat some world-class players at Starcraft II, a similarly competitive video game. On top of that, Google teams have:
Defeated the world champion at Go.
Created a model that has world-class performance at chess, go, shogi, and Atari games without even knowing the rules.
Achieved an impressive reduction in the amount of training needed to reach human-level performance in Atari games.
Winner: Google, by a large margin.
Generalist AI
This category refers to models that can do many things at once. So, not just play Go, but play Go AND drive a car. A lot of people think that this category is absolutely crucial to developing AGI. In this category, OpenAI hasn’t even published an important paper since 2018-2019 (depending on how you define important). You can view their latest blog post on emergent tool use here, but really, it’s been years of silence on this front from OpenAI.
Google, meanwhile has published a series of ever-more impressive papers showing some REALLY impressive results 😼.
Gato7 is an agent that can play Atari, caption images, chat, stack blocks with a real robot arm, and much more.
RT-1 is a robotics model that can take instructions in natural language and execute them in the real world8.
Dreamer v3 (the same model that solved Minecraft) is also state-of-the-art in proprioception control and a number of other tasks.
Ada (mentioned at the beginning of this blog post) can learn how to complete unseen tasks in 1-2 tries.
And these are only the highlights of what came out of Google in 2022 + January 2023. It has a history of impressive publications in this field.
Winner: Google, by a landslide.
Notable Mention in AI
There are a lot of other AI achievements coming out of both OpenAI and Google: speech transcription, music generation, etc. But only one deserves a special mention: AlphaFold. Google has solved protein folding, which has been a major scientific challenge for all of humanity. This is probably the greatest AI achievement to date.
Apart from where the two companies are in their research, there are other factors at play that can determine how quickly the companies progress. Let’s take a look at those as well.
Funding
Prior to the most recent round of funding, OpenAI had raised around $2B cumulatively over 7 years. DeepMind’s budget in 2021 alone was $1.84B. Considering that it is not the only AI effort at Google, it’s safe to assume that Google spent more on AI in 2021 alone than OpenAI had over its entire history. Now, however, OpenAI has pocketed $10B from Microsoft. I’m assuming that they will ramp up spending pretty quickly going forward. That being said, Google’s total R&D spending in 2021 totaled over $30B, so it has room to ramp up spending as well.
Winner: Google historically, unclear going forward.
Headcount
I don’t really view headcount as something that can be taken at face value, but it is a number we can look at. OpenAI currently has 375 employees. DeepMind employed around 1000 people in 2020, I hear that now it’s over 1500, but I have no reliable source to quote on this. Again, DeepMind is not the only AI lab at Google.
Winner: Google.
Compute Availability
Google and Microsoft both have world-class cloud offerings. Both OpenAI and Google’s internal teams have priority access to them.
Winner: tie.
Compute Cost
There are two inputs to the cost of compute: electricity and silicon. I’m assuming that Microsoft and Google have similar electricity costs, but silicon is likely to be slightly cheaper for Google. Microsoft uses Nvidia chips, while Google has its own TPU chips. In general, TPU chips anecdotally appear to be cheaper for cluster-scale training, and that’s looking at costs for third parties. OpenAI has to use GPUs, so not only do they use a less efficient architecture, but there is also a hefty embedded Nvidia margin in OpenAI costs. Google’s internal costs for TPUs are likely significantly lower, though Google probably doesn’t allocate the TPU development cost to its AI initiatives.
Winner: likely Google.
Data
We have recently become more aware of how data is a major limiting factor in AI progress. I don’t think there is a single organization in the world that has access to more data of all kinds than Google. It powers the world’s biggest search engine and the most popular phone OS. It has a wide range of products with their own massive datasets (YouTube, Google Scholar, etc.) There is no way anyone in the world can compete with that. By contrast, OpenAI has mostly to rely on publicly-available data, augmenting that with what they gather through their products, chief among which is ChatGPT.
Winner: Google, and it’s not even remotely close.
Focus
OpenAI’s sole focus is on getting to AGI first. Google has no such singular focus. DeepMind likely does, but it must have to divert some attention to interfacing with its corporate parent. OpenAI leadership has to spend time on its deals with Microsoft and other investors as well, but it’s likely a much smaller percentage of their time.
Winner: OpenAI.
Looking at all of the above, my take is that the only serious advantage OpenAI has on paper is its focus. That is not to be underestimated! There is a long history of focused startups beating entrenched MegaCos in Silicon Valley. See, for example, The Innovator’s Dilemma. That being said, there are also precedents for large companies being nimble, and reacting to market changes with great pivots. The best example that comes to mind is Intel abandoning its DRAM business to focus on the comparatively tiny (at that point) market of microprocessors. But that requires strong leadership that clearly sees the situation for what it is.
What would it look like if Google suddenly realized that it has no future, save for AI? Well, you’d probably see its CEO declare a “code red” emergency and get approval from the controlling shareholders for a massive strategy pivot. And those are exactly the things that we are seeing happen.
Now, from the outside, right now, it’s impossible to tell whether this is a “code red” that Google leadership just talks about for a while, or whether it forces a serious pivot towards Google being a get-to-AGI-first-at-all-costs company. What would that look like? Cutting off9 all products that don’t either make money or help the company advance toward the AGI goal. Massively redirecting its R&D spend towards AI. Integrating their existing SOTA AI technologies into their products.
If we see signs of the latter happening, I’d bet on Google to win the AGI race. They have massive advantages in terms of starting position, scale, resources, and data.
If not… We will see. Even with all of the disadvantages of being a huge company, Google has been able to stay in the lead so far. I know OpenAI has been all the hype lately because they release their work publicly, but don’t write Google off quite yet.
I will stop making the distinction here. Later on, read Google to mean Google+DeepMind. Theoretically, DeepMind has some independence in governance, but I will just view them as a 100% owned subsidiary.
Or virtual agents.
Note that here and later I’m only looking at Google and OpenAI papers. Other labs have done impressive work, but none of them IMO are quite on the same level as Google and OpenAI.
I happened to attend that game, it was awesome.
I kind of feel bad writing Google here, because it was all DeepMind, but I’m sticking to my guns.
OK, there’s no “solving” Minecraft, but getting to collecting diamonds without human training data has been a longstanding challenge in the field.
Gato means cat.
If a Google kitchen can be viewed as a part of the “real world”.
Either discontinuing completely or just not letting them take up a lot of money and management attention.
If Google *can’t* effectively leverage their advantages, I’ll hate them forever; it would just be inexcusable big company nonsense!!! I don’t have a dog in the fight, but Christ would it cement their reputation as having just totally lost it at the strategic level.
I see no evidence we are near AGI nor that Google is leading the quest towards it. I consider the assumption that LLMs make AGI inevitable and close mostly a myth of the current hype cycle.