Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's a little difficult to do because a lot of the numbers are confidential inside the companies.
But he figured things out from what he could find, and we'll link to some of those articles in the show notes.
Now, you might also recall this very influential graph showing that AIs can successfully complete longer and longer software engineering tasks that came out of the organization META.
Now, this is absolutely true, but a big part of that is being driven by the models being given access to way more thinking time to try and complete these tasks than they'd ever been given before.
So the kind of exponential increases in the length of tasks that can be completed that we saw have come with commensurately exponential increases in cost.
So much so that in some cases, these AI agents maybe cost about the same amount to run for an hour as it would actually cost you to hire a human software engineer, hundreds of dollars, in fact.
And at that price, it wouldn't be anywhere near economically rational to go ahead and scale up their thinking time another tenfold, up to, you know, thousands or tens of thousands of dollars, not in the immediate future anyway.
And so that is a reason to think that progress in 2026 and 2027 won't necessarily come at the same pace that we saw in 2025.
So it stopped looking like we could scale up thinking time like we had been doing before.
But maybe we could instead scale up reasoning training, which is more like a fixed cost rather than a cost that you have to pay every single time you have a question.
Unfortunately, that turned out not to be working as well as people had hoped in early 2025 either.
It's hard to pin down exactly because the details of AI model training are super commercially sensitive that they don't literally publish the numbers that you might want to have.
But Toby Ord estimates that the compute efficiency of this kind of reinforcement learning might be literally one millionth as high as it was back in the predict the next word era or the pre-training era.
That is a huge penalty, as you can imagine.
The trouble with the method is that reinforcement learning works by having the AI generate lots and lots of attempts to solve a really difficult coding or maths or philosophy problem.
And whenever it does manage to get the right answer, we say, great, yes, do more of whatever you were doing that time around.
This clearly does work.
The models did get much better in these domains, but it requires a lot of computation to squeeze out a relatively modest amount of education or learning.
The reason is that unlike with training models on accumulated human knowledge scraped from books or internet posts or GitHub or whatever,
Here, it's the AIs themselves that have to make the content they're attempting to learn from, which includes making vast numbers of garbage failed attempts to solve problems that ultimately go nowhere.