Illia Polosukhin
๐ค SpeakerAppearances Over Time
Podcast Appearances
One of the next improvements is, in general, better training.
And so IRL is kind of part of it, but IRL is still very spotty.
In general, AI is like alchemy.
I don't know if you've read any of the technical papers, but there's like...
We're using learning rate of 0.01 until step 10,000, and then we switch to 0.01, and then at 100 million steps, we're going to anneal it at rate 2x.
It's like, how did you come up with this?
Where did this come from?
Yeah.
Well, it's all kind of half made up and half is from experience.
They were trying to do something.
It didn't work.
They were changing a bunch of stuff until it worked.
And now they're not going to go and redo everything, figuring out if other options work.
They're just going to keep whatever worked.
Yeah.
And so like figuring out how to like go away from that.
And so RL is even worse.
RL is like literally, you know, we have no idea, but you know, hopefully like this reward function works, you know, we run it, it works great, you know, ship the paper, ship the model.
So it's a very like kind of semi-arbitrary.
There is no like actual science around reward distribution and kind of reward provocation.