Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

80,000 Hours Podcast

What the hell happened with AGI timelines in 2025?

10 Feb 2026

Transcription

Chapter 1: What triggered the initial optimism about AGI timelines in early 2025?

0.031 - 3.054 Rob Wiblin

Let's talk timelines to artificial general intelligence.

0

4.215 - 15.468 Unknown

In late 2024 and early 2025, after OpenAI put out the first ever set of reasoning models, that is, 01 and 03, short timelines to transformative artificial general intelligence swept the AI world.

0

16.288 - 19.131 Rob Wiblin

But then, in the second half of 2025, a strange thing happened.

0

19.652 - 29.042 Unknown

Sentiment swung all the way back in the other direction, with people's forecasts for when AI might really shake things up blowing out further than they had been even before reasoning models came along in the first place.

0

Chapter 2: What caused the significant contraction of AGI timelines later in 2025?

30.068 - 41.985 Unknown

What the hell happened? Was it just swings in vibes and mood? Confusion? Just a series of fundamentally unexpected research results? I've been trying to make sense of it myself, and here is the best explanation I've come up with.

0

47.163 - 55.231 Rob Wiblin

The reasoning models like 01 and 03 seemed to have a big impact inside the AI companies as well as in the public outside of them.

0

Chapter 3: How did reasoning models impact expectations for AGI development?

55.491 - 69.925 Rob Wiblin

Sam Altman declared in January last year, we are now confident we know how to build AGI. And Demis over at DeepMind, who's normally more circumspect, he said he thought AGI was probably three to five years away. And as often the case, Dario from Anthropic had the most colorful turn of phrase.

0

70.065 - 75.55 Rob Wiblin

He announced a country of geniuses and data center were quite likely to get that in the next two to three years.

0

Chapter 4: What technical factors contributed to the shift in AGI timelines?

75.53 - 95.108 Rob Wiblin

We also saw huge levels of popular coverage of the AGI scenario known as AI 2027, in which AI research and development is fully automated in 2027. And that then leads in the story to a powerful recursive self-improvement loop and a so-called intelligence explosion. I think the massive coverage and engagement with that story definitely shifted the vibes as well.

0

95.088 - 109.549 Rob Wiblin

Even at 80,000 Hours, where I work, made a video about the AI 2027 story that got a stupid number of views. I'm not going to say exactly how many because I'm sure it will have gone up a whole bunch by the time we post this. To an extent, all of this hype ran a touch ahead of what people actually believed.

0

110.19 - 126.652 Rob Wiblin

When they put out the AI 2027 scenario, the writers, who are absolutely as bullish about AI as you can reasonably get, they still thought that we wouldn't get a superhuman coda in reality until a year and a half after what actually happens in their story. But people really did get a lot more excited and then changed their minds back again.

0

126.792 - 129.896 Rob Wiblin

So let's take a tour of some of the technical factors that drove that.

0

Chapter 5: What are the reasons behind the growing pessimism regarding AGI timelines?

130.457 - 148.618 Rob Wiblin

I think it's no mystery why lots of people, absolutely including me, got super excited about reasoning models when they arrived. When they landed, it just suddenly felt like they could suddenly do so many things that the previous generation of AI failed horribly at. But what made the shine wear off as time went on?

0

149.155 - 169.02 Rob Wiblin

Well, the hope among people inside the industry and among AI enthusiasts had been that reinforcement learning on domains that are easily checkable, things like mathematics and coding, that that would generalize to other messier domains where it's a lot harder to say and to check if someone has actually done the right thing or gotten the right answer.

0

169 - 191.505 Rob Wiblin

people were kind of primed to expect that this might work because fine-tuning models to follow instructions and be helpful to users really had generalized shockingly well across almost all the kinds of different things that users tend to ask AI models for. But as 2025 wore on, it became apparent that the same thing wasn't really happening here with reasoning generalization.

0

192.126 - 207.664 Rob Wiblin

The reasoning models that had been optimized to reason were a lot better at math and logic and coding, but they weren't suddenly able to extrapolate from that to go and book you a flight, say, or go away and organize an event that actually works.

0

Chapter 6: How did the automation of AI research influence AGI predictions?

207.644 - 228.353 Rob Wiblin

A senior staff member at an AI company recently told me that for them, this overall experience actually updated them towards longer timelines to artificial general intelligence. Because until then, that possible generalization from easily checkable to non-checkable domains had been one plausible path to really rapid, perhaps unexpectedly rapid, capabilities gains.

0

228.333 - 241.325 Rob Wiblin

And now he saw that that was basically ruled out. A lot of people think that this autonomy issue is changing right now with the arrival of Anthropic's Claude Opus 4.5 and Claude Code and as of January 2026, Claude Cowork.

0

242.065 - 258.34 Rob Wiblin

But even if that does pan out as much as people are hoping and some people are expecting, it'll be because Anthropic trained its models on these autonomy tasks specifically, not because of magical generalization from reasoning tasks that we had originally hoped would get us those kinds of capabilities sort of for free.

0

258.32 - 273.063 Rob Wiblin

So there are two big ways that reasoning models got so much better at solving a particular kind of problem. One was actually being better at reasoning, you know, basically being smarter and more logical and able to maintain a thread for a decent amount of time.

0

273.885 - 285.323 Rob Wiblin

And the other was that they were given much longer to think through each question they were asked, where previous models had mostly not been given thinking time at all. They'd more or less just had to blurt out the first thing that popped into their head when they were asked a question.

286.332 - 295.745 Rob Wiblin

Early on with reasoning models, it was kind of hard to tell how much work was being done by the first thing, actually being smarter, versus the second thing, so-called inference scaling.

Chapter 7: What are the implications of the evolving AI capabilities on future predictions?

296.646 - 314.478 Rob Wiblin

And that question mattered enormously because it's really expensive to have LLMs think for a lot longer every single time you want them to answer something. The thing is, we could get this big proportional increase in thinking time in 2024 and 2025 because we'd basically been starting from zero.

0

315.319 - 333.197 Rob Wiblin

But while it was affordable to go from giving models zero minutes of thinking time up to one minute of thinking time, there actually aren't enough computer chips in the world to go on and give models 10 minutes or 100 minutes to think in order to give slightly better answers, at least not for normal situations.

0

333.177 - 351.044 Rob Wiblin

And that fact would make the gains from increasing thinking time more of a one-off, not a trend that could carry on from 2024 to 2025 into 2026 and 2027. And indeed, it does appear that more than two thirds of the improved performance of reasoning models came from giving them more time to think.

0

351.725 - 369.554 Rob Wiblin

And unfortunately, that's not a trick we can pull off again, not until we go away and actually manufacture more and faster computer chips, which does happen, but doesn't happen at anywhere near the rate that we were able to scale up thinking time before. And this realization meant that analysts came to expect slower improvements in AI capabilities going forward.

0

369.774 - 381.547 Rob Wiblin

Past guest of the show, Toby Ord, has done the best public analysis of this that I'm aware of. It's a little difficult to do because a lot of the numbers are confidential inside the companies. But he figured things out from what he could find, and we'll link to some of those articles in the show notes.

381.567 - 400.664 Rob Wiblin

Now, you might also recall this very influential graph showing that AIs can successfully complete longer and longer software engineering tasks that came out of the organization META. Now, this is absolutely true, but a big part of that is being driven by the models being given access to way more thinking time to try and complete these tasks than they'd ever been given before.

401.545 - 420.812 Rob Wiblin

So the kind of exponential increases in the length of tasks that can be completed that we saw have come with commensurately exponential increases in cost. So much so that in some cases, these AI agents maybe cost about the same amount to run for an hour as it would actually cost you to hire a human software engineer, hundreds of dollars, in fact.

420.792 - 440.32 Rob Wiblin

And at that price, it wouldn't be anywhere near economically rational to go ahead and scale up their thinking time another tenfold, up to, you know, thousands or tens of thousands of dollars, not in the immediate future anyway. And so that is a reason to think that progress in 2026 and 2027 won't necessarily come at the same pace that we saw in 2025.

440.3 - 458.207 Rob Wiblin

So it stopped looking like we could scale up thinking time like we had been doing before. But maybe we could instead scale up reasoning training, which is more like a fixed cost rather than a cost that you have to pay every single time you have a question. Unfortunately, that turned out not to be working as well as people had hoped in early 2025 either.

Chapter 8: How do societal perceptions of AI differ from industry expectations?

533.452 - 549.855 Rob Wiblin

What part of the reasoning that it went through was the part where it went wrong and went off in the wrong direction? Or what part was the section where it made the breakthrough that led it to get the right answer? And it gets no guidance on that kind of thing. Someone memorably described all this as an AI trying to suck intelligence through a tiny straw.

0

549.835 - 568.296 Rob Wiblin

And the bottom line is that we got a big boost in capabilities by taking this kind of reinforcement learning in confirmable domains from nothing to where it is now. But we just don't have the computer chips around in the world to scale up reinforcement learning another thousand fold in order to get another similarly large leap in performance. Let's recap.

0

569.217 - 589.208 Rob Wiblin

Yes, we did get a big performance boost by scaling thinking time, and we got a lot of value out of scaling reinforcement learning in maths and coding and so on. But as these other facts that I've been talking about became apparent through 2025, people stopped believing that progress would remain as fast in 2026 as it had been up until then. And that led to a wave of pessimism.

0

590.302 - 606.506 Rob Wiblin

So I've been trying to explain why I think people's opinions shifted. But if I'm honest with you, I don't think that I really am that confident that the line of reasoning is right. And that's because the AI companies always scale up one thing to improve performance until it hits diminishing returns. But then they find something else to scale up.

0

606.926 - 622.646 Rob Wiblin

They famously massively scaled up the compute that went into training models to predict the next word. And that worked great. But then it started to hit diminishing returns as they had scaled it up 10, 100, 1,000 fold. But then the bigger focus became scaling up reinforcement learning from human feedback to make the models act more like helpful assistants that actually did things for you.

623.126 - 639.101 Rob Wiblin

And that worked great. It made a big difference until it also started to hit diminishing returns. So then they naturally moved on to throwing compute at inference scaling and reinforcement learning for reasoning, as I was just describing. Each individual thing, as they scale it up 10, 100, 1,000, 10,000-fold, it always kind of returns Peter out.

639.581 - 656.402 Rob Wiblin

But so long as there's always something else to move on to, then the big picture trend will remain one of steady improvement like we've seen before. I'm sure their research teams are feverishly working to figure out other ways to efficiently convert computer chips into smarter and more useful and more capable AI models. That's the entire job of their research.

657.163 - 676.451 Rob Wiblin

Will they succeed again for, you know, the fourth or fifth time? Or have they kind of run out of tricks for now where we might have a couple of years to wait? That is the fundamental unknown that everyone both inside and outside the companies is more or less forced to repeatedly speculate about until they either do or they don't. So those were some technical updates, new technical results.

677.112 - 695.908 Rob Wiblin

But there are a lot of worries that people have always had that also became more salient in the second half of 2025. Here are a couple that stand out to me. First, there was an ever-growing gap over that time between what AI models seemed like they could do in demos and how much they were actually upending most workplaces and the world around us and our personal lives.

Comments

There are no comments yet.

Please log in to write the first comment.