Stuart Russell
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And the way programs succeed and the way humans succeed is by only looking at a small fraction of the search tree.
And if you look at the right fraction, you play really well.
If you look at the wrong fraction, if you waste your time thinking about things that are never going to happen, the moves that no one's ever going to make, then you're going to lose because you won't be able to figure out the right decision.
So that question of how machines can manage their own computation, how they decide what to think about, is the meta-reasoning question.
We developed some methods for doing that, and very simply, a machine should think about whatever thoughts are going to improve its decision quality.
we were able to show that both for Othello, which is a standard two-player game, and for Backgammon, which includes dice rolls, so it's a two-player game with uncertainty.
For both of those cases, we could come up with algorithms that were actually much more efficient than the standard alpha-beta search, which chess programs at the time were using, and that those programs could beat me.
And I think you can see
the same basic ideas in AlphaGo and AlphaZero today.
The way they explore the tree is using a form of meta reasoning to select what to think about based on how useful it is to think about it.
There's really two kinds of learning going on.
So as you say, AlphaGo learns to evaluate board positions.
So it can look at a Go board and it actually has probably a superhuman ability to instantly tell how promising that situation is.
To me, the amazing thing about AlphaGo is not that it can beat the world champion with its hands tied behind his back, but the fact that
if you stop it from searching altogether, so you say, okay, you're not allowed to do any thinking ahead, right?
You can just consider each of your legal moves and then look at the resulting situation and evaluate it.
So what we call a depth one search.
So just the immediate outcome of your moves and decide if that's good or bad.
That version of AlphaGo is,
can still play at a professional level, right?