Yannis Antonoglou
๐ค SpeakerAppearances Over Time
Podcast Appearances
But like fundamentally what it is, is you can think of like an environment even like in the more reinforcement learning
kind of like terminology as a system where like your model can interact with.
So it can take actions like in our case, actions are like tool use, you can actually use certain tools, it can or MCPs.
And when it calls these tools or MCPs, it actually gets a new observation, like a new state of the world.
And this way, you know, it can interact
it can interact with the world so that it can complete the task.
So a world, for example, can be a terminal.
You can have a bash and then you type commands into the bash and then you just get the output of these commands and then the whole point is for the model to just solve the task that you asked it to solve.
Right, yeah.
I mean, I think data and environments are kind of quite similar, and
It's also like the terminology that people use depends on their background.
So synthetic data, for example.
When people say synthetic data and that the model can learn through synthetic data, what that means is that the model generates data and then it can actually... You do some form of filtering and then you can learn on the data that you've selected based on the filtering.
And...
You know, this is kind of like exactly how reinforcement learning works.
Like the whole idea of reinforcement learning is like trial and error, as in the model tries something and then based on the outcome of like what you tried, it either wants to just like do more of it or less of it.
And so like synthetic data and reinforcement learning, like inherently they're like really kind of almost the same thing at some level of abstraction.
And what reinforcement learning tries to do is just like be much more efficient in the way that it learns from this data, it learns from this experience.
Yeah, so we actually, you know, we believe in the Transformers and I understand why people and especially like, you know, people like Reed Sutton or Andre Kapathy feel that Transformers have like some inherent limitations that will stop us from like
you know, going all the way to AGI.