Illia Polosukhin
👤 PersonAppearances Over Time
Podcast Appearances
And I've kind of expressed some of them.
I mean, like a few years ago, RL was one of them.
And kind of we saw really kind of improvements coming from reinforcement learning.
I definitely think like there is only that much you can stuff, you know, random articles into a model until, you know, it stops learning.
But what we see is if you take some size of the model, like 8 billion parameter, the quality of that model keeps improving.
Meaning like at the same scale of the parameters, we're getting better at how we train them.
And so that to me is the main, like the way I look at these things is like, hey, let's fix the size and see the progression there.
And that progression defines to me the kind of, are we getting better at improving these models?
Yeah, I mean, well, bigger the problem is it's hard to compare apples to apples, right?
Like, hey, is this model better than what it was three months ago?
It is on some metrics, but if you spend 10x more compute, is that actually the improvement we're looking for?
So to me, the kind of...
One of the next improvements is, in general, better training.
And so IRL is kind of part of it, but IRL is still very spotty.
In general, AI is like alchemy.
I don't know if you've read any of the technical papers, but there's like...
We're using learning rate of 0.01 until step 10,000, and then we switch to 0.01, and then at 100 million steps, we're going to anneal it at rate 2x.
It's like, how did you come up with this?
Where did this come from?
Yeah.