Cal Newport
๐ค SpeakerAppearances Over Time
Podcast Appearances
And they switched the focus from just, this is amazing if you use it, to we have these benchmark graphs.
And look at these graphs.
Things are going better on these benchmarks.
It all became about benchmarks because these are very narrow things that you could train models to do well on.
They weren't intuitive.
GPT-4 was just awesome.
By the time we got the GPT-5, their whole launch, their launch page had 28 graphs of benchmark names that no one knew what they were.
And so then they had to look for all these other ways to get improvement.
And that's where we got like inference time compute.
Well, what if we compute longer for harder questions and they began really pushing fine tuning?
Well, for specific types of problems, we can get data sets that have answers and
questions and answers and we can use reinforcement learning to try to take this pre-trained model and make it better at this particular type of problem.
And then we can have a benchmark that shows us we got better at this problem.
And my argument in that article is like,
this is a way different game than we were playing when we went from 2 to 3 and 3 to 4.
We're no longer scaling to AGI.
We're taking basically GPT-4 and we're doing all of this, like, tuning and adding extra stuff on top of it and around it and measuring these very narrow benchmarks.
And that's why people have this feeling ever since.
Like, I...
I guess they're better, but it's not in an obvious way.