Yannis Antonoglou
๐ค SpeakerAppearances Over Time
Podcast Appearances
is to try new algorithms.
So you can try to take this model, do two different reinforcement learning algorithms, for example, and then just see which one behaves best, and then just do that with a model that's a frontier model.
Yes.
I mean, reinforcement learning is, we are like, you know, have true believers in reinforcement learning.
Many of the team have like a strong reinforcement learning background, including myself.
So for us, it's, you know, it's one of the big bets.
Yeah, I mean, I guess like, you know, the fan architecture is something that like, you have to wait and see, like when the model lands.
Fair enough.
I cannot really, you know, kind of like share anything at this point.
At the same time,
you know, like the what you're building is a frontier agenda model.
So think of like a system that can do multi-step reasoning.
So kind of like really interact with like tools and environments, just like complete the task end to end.
So that means that it needs to understand long context.
It needs to be able to self-correct.
There are certain capabilities that are really important in order to have agentic intelligence.
There are many things that go into training these models, from instruction following and pre-training and the data mixtures, but also a lot of reinforcement learning and what environments you use to really train this model.
An environment can be anything.
It can be something simple like a coding environment.
It can be an actual like a website or something that like you want to simulate.