Cal Newport
๐ค SpeakerAppearances Over Time
Podcast Appearances
That sounds kind of obvious, but in like machine learning circles, that was surprising because there's this idea of overfitting where if you just make your model bigger, the performance goes down.
So it used to be like you have to find the perfect size model for your problem space.
That's the way people thought about machine learning until this paper came out.
And like, I don't know, transformer-based LLMs, they were using GPT-2 and they were systematically making it bigger and they were seeing that the performance just kept going up.
Like, this is interesting.
So let's try it.
And that was GPT-3.
All right, let's actually make this like 10x bigger.
Surely this can't be right.
And it was.
It matched the Kaplan curve exactly.
Like, oh my God, this actually got way better just by making this bigger.
Like, all right, well, certainly that must be the end of it.
Let's try it with GPT-4.
They made it bigger.
They trained it much longer.
Months and months they trained it.
Microsoft had to build these custom data centers to train it with new AC technology that didn't exist before.
And it fit the curve.
It was like way better.