Demis Hassabis
π€ SpeakerAppearances Over Time
Podcast Appearances
So we were in the sweet spot of the S-curve. So it's not too easy, it's trivial or too hard. You can't even see if you're making any progress. You want to be in that maximum sort of part of the S-curve where you're making almost exponential progress. And we could keep picking harder and harder games as our systems got improved.
So we were in the sweet spot of the S-curve. So it's not too easy, it's trivial or too hard. You can't even see if you're making any progress. You want to be in that maximum sort of part of the S-curve where you're making almost exponential progress. And we could keep picking harder and harder games as our systems got improved.
So we were in the sweet spot of the S-curve. So it's not too easy, it's trivial or too hard. You can't even see if you're making any progress. You want to be in that maximum sort of part of the S-curve where you're making almost exponential progress. And we could keep picking harder and harder games as our systems got improved.
And then the other nice feature about games is because they're some kind of microcosm of the real world, they've usually been boiled down to very clear objective functions, right? So winning the game or maximizing the score is usually the objective of a game. And that's very easy to specify to a reinforcement learning system or an agent-based system. So it's perfect for hill climbing against.
And then the other nice feature about games is because they're some kind of microcosm of the real world, they've usually been boiled down to very clear objective functions, right? So winning the game or maximizing the score is usually the objective of a game. And that's very easy to specify to a reinforcement learning system or an agent-based system. So it's perfect for hill climbing against.
And then the other nice feature about games is because they're some kind of microcosm of the real world, they've usually been boiled down to very clear objective functions, right? So winning the game or maximizing the score is usually the objective of a game. And that's very easy to specify to a reinforcement learning system or an agent-based system. So it's perfect for hill climbing against.
and measuring ELO scores, ratings, and exactly where you are. And then finally, of course, you can calibrate yourselves against the best human players. So you can sort of calibrate what your agents are doing in their own tournaments.
and measuring ELO scores, ratings, and exactly where you are. And then finally, of course, you can calibrate yourselves against the best human players. So you can sort of calibrate what your agents are doing in their own tournaments.
and measuring ELO scores, ratings, and exactly where you are. And then finally, of course, you can calibrate yourselves against the best human players. So you can sort of calibrate what your agents are doing in their own tournaments.
In the end, even with the StarCraft agent, we had to eventually challenge a professional grandmaster at StarCraft to make sure that our systems hadn't overfitted somehow to their own tournament strategies. It actually needed to be, oh, we grounded it with, oh, it can actually be a genuine human grandmaster StarCraft player.
In the end, even with the StarCraft agent, we had to eventually challenge a professional grandmaster at StarCraft to make sure that our systems hadn't overfitted somehow to their own tournament strategies. It actually needed to be, oh, we grounded it with, oh, it can actually be a genuine human grandmaster StarCraft player.
In the end, even with the StarCraft agent, we had to eventually challenge a professional grandmaster at StarCraft to make sure that our systems hadn't overfitted somehow to their own tournament strategies. It actually needed to be, oh, we grounded it with, oh, it can actually be a genuine human grandmaster StarCraft player.
The final thing is, of course, you can generate as much synthetic data as you want with games too, which is coming into vogue right now, again, about data limitations and with large language models and how many tokens left in the world and has it read everything in the world. Obviously, for things like games, you can actually just play the system against itself and generate lots more
The final thing is, of course, you can generate as much synthetic data as you want with games too, which is coming into vogue right now, again, about data limitations and with large language models and how many tokens left in the world and has it read everything in the world. Obviously, for things like games, you can actually just play the system against itself and generate lots more
The final thing is, of course, you can generate as much synthetic data as you want with games too, which is coming into vogue right now, again, about data limitations and with large language models and how many tokens left in the world and has it read everything in the world. Obviously, for things like games, you can actually just play the system against itself and generate lots more
data from the right distribution.
data from the right distribution.
data from the right distribution.
Well, I've always been a huge proponent of simulations and AI. And it's also interesting to think about what the real world is in terms of a computational system. And so I've always been involved with trying to build very realistic simulations of things. And now, of course, that interacts with AI because you can have an AI that learns a simulator of some real world system
Well, I've always been a huge proponent of simulations and AI. And it's also interesting to think about what the real world is in terms of a computational system. And so I've always been involved with trying to build very realistic simulations of things. And now, of course, that interacts with AI because you can have an AI that learns a simulator of some real world system