Illia Polosukhin
๐ค SpeakerAppearances Over Time
Podcast Appearances
Well, it does that.
It's also like, and so it's very prone to like errors because especially like there was like all this fun stories of, you know, your model figuring out that actually it can look in the file where the answers are if you give it like file system tools.
or search or anything, it actually finds out how to get the answers.
And this is way cheaper and better than actually thinking about stuff.
So this is why we kind of need a better kind of training mechanisms.
And that's why, again, from a research perspective, I look at fixed size model.
Can we make them better?
Because that effectively shows we have a better training procedure.
And I mean, there's actually like, you know, improvements, like there's a new optimizer came out that, you know, like effectively for many years, there was this atom that was used.
And so now there's like this Muon new optimizer, which like, you know, has different algebraic properties around how gradients are propagated.
And, you know, there's like probably, I don't know, 50 people in the world that actually understands exact like mechanism of that beyond like it's magic.
But, you know, it literally shows that model just trains faster.
Like same data, same model size trains faster, right?
So clearly we haven't squeezed out everything we can from just how we're training these models.
So that's kind of like, I would say like big area for me and I'm excited in that.
I think the other one is just context.
A lot of these models are actually really good now, but they just don't know stuff.
They don't know stuff about you.
They don't know stuff about your company and your work you're doing.
They don't know things about the world that's happening right now.