Ilya Sutskever
👤 PersonAppearances Over Time
Podcast Appearances
But it does suggest that something strange is going on.
I have two possible explanations.
So here, this is the more kind of...
whimsical explanation is that maybe RL training makes the models a little bit too single-minded and narrowly focused, a little bit too, I don't know, unaware, even though it also makes them aware in some other ways.
And because of this, they can't do basic things.
But there is another explanation, which is
Back when people were doing pre-training, the question of what data to train on was answered.
Because that answer was everything.
When you do pre-training, you need all the data.
So you don't have to think, is it going to be this data or that data?
But when people do RL training, they do need to think.
They say, okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.
And from what I hear, all the companies have teams that just produce new RL environments and just add it to the training mix.
And the question is, well, what are those?
There are so many degrees of freedom.
There is such a huge variety of RL environments you could produce.
And one thing you could do, and I think that's something that is done inadvertently, is that people take inspiration from the evals.
You say, hey, I would love our model to do really well when we release it.
I want the evals to look great.
what would be RL training that could help on this task, right?