Ilya Sutskever
👤 PersonAppearances Over Time
Podcast Appearances
And so you say, okay, so what are we scaling?
And pre-training was a thing to scale.
It was a particular scaling recipe.
The big breakthrough of pre-training is
is the realization that this recipe is good so you say hey if you mix some compute with some data into a neural net of a certain size you will get results and you will know that will be better if you just scale the recipe up
And this is also great.
Companies love this because it gives you a very low-risk way of investing your resources.
It's much harder to invest your resources in research.
Compare that.
If you research, you need to have go-forth researchers and research and come up with something versus get more data, get more compute, you know you'll get something from pre-training.
And indeed, you know, it looks like based on various things, some people say on Twitter, maybe it appears that Gemini have found a way to get more out of pre-training.
At some point, though, pre-training will run out of data.
The data is very clearly finite.
And so then, okay, what do you do next?
Either you do some kind of souped up retraining, different recipe from the one you've done before, or you're doing RL, or maybe something else.
But now that compute is big, compute is now very big.
In some sense, we are back to the age of research.
So maybe here's another way to put it.
Up until 2020, from 2012 to 2020, it was the age of research.
Now, from 2020 to 2025, it was the age of scaling.