Ilya Sutskever
๐ค SpeakerAppearances Over Time
Podcast Appearances
Or maybe plus minus, let's add error bars to those years.
Because people say, this is amazing, you gotta scale more, keep scaling.
The one word, scaling.
But now the scale is so big, like, is the belief really that, oh, it's so big, but if you had 100x more, everything would be so different.
Like it would be different for sure.
But like, is the belief that if you just honeydex the scale, everything would be transformed?
I don't think that's true.
So it's back to the age of research again, just with big computers.
So we've already witnessed a transition from one type of scaling to a different type of scaling, from pre-training to RL.
Now people are scaling RL.
Now, based on what people say on Twitter, they spend more compute on RL than on pre-training at this point because RL can actually consume quite a bit of compute.
You know, you do very, very long rollouts.
Yes.
So it takes a lot of compute to produce those rollouts.
And then you get relatively small amount of learning for the rollout.
So you really can spend a lot of compute.
And I could imagine, like, I wouldn't, at this, it's more like, I wouldn't even call it a scaling.
I would say, hey, like, what are you doing?
And is the thing you are doing the most productive thing you could be doing?
Can you find a more productive way of using your compute?