Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
And then at the end of each of these attempts, all the AI gets is, you got the right answer or you got the wrong answer.
But often they've been reasoning for hundreds of pages.
What part of the reasoning that it went through was the part where it went wrong and went off in the wrong direction?
Or what part was the section where it made the breakthrough that led it to get the right answer?
And it gets no guidance on that kind of thing.
Someone memorably described all this as an AI trying to suck intelligence through a tiny straw.
And the bottom line is that we got a big boost in capabilities by taking this kind of reinforcement learning in confirmable domains from nothing to where it is now.
But we just don't have the computer chips around in the world to scale up reinforcement learning another thousand fold in order to get another similarly large leap in performance.
Let's recap.
Yes, we did get a big performance boost by scaling thinking time, and we got a lot of value out of scaling reinforcement learning in maths and coding and so on.
But as these other facts that I've been talking about became apparent through 2025, people stopped believing that progress would remain as fast in 2026 as it had been up until then.
And that led to a wave of pessimism.
So I've been trying to explain why I think people's opinions shifted.
But if I'm honest with you, I don't think that I really am that confident that the line of reasoning is right.
And that's because the AI companies always scale up one thing to improve performance until it hits diminishing returns.
But then they find something else to scale up.
They famously massively scaled up the compute that went into training models to predict the next word.
And that worked great.
But then it started to hit diminishing returns as they had scaled it up 10, 100, 1,000 fold.
But then the bigger focus became scaling up reinforcement learning from human feedback to make the models act more like helpful assistants that actually did things for you.