Richard Sutton
👤 PersonAppearances Over Time
Podcast Appearances
What would happen?
Learning and search have just won the day.
But there's a sense which that was not surprising to me because I was always voting for or hoping or rooting for the simple basic principles.
And so even with the large language models, it's surprising how well it worked, but it was all good and gratifying.
And things like AlphaGo, it's sort of surprising how well that was able to work.
And AlphaZero in particular, how well it was able to work.
But it's all very gratifying because, again, it's simple basic principles are winning the day.
So the whole AlphaGo thing has a precursor, which is TD Gammon.
Jerry Tesoro did exactly that.
reinforcement learning, temporal difference learning methods to play backgammon.
And it beat the world's best players.
And it worked really well.
And so in some sense, AlphaGo was merely a scaling up of that process.
It was quite a bit of scaling up, and there was also an additional innovation in how the search was done.
But it made sense.
It wasn't surprising in that sense.
AlphaGo actually didn't use TD Learning.
It waited to see the final outcomes.
But AlphaZero used TD and AlphaZero was applied to all the other games and did extremely well.
I've always been very impressed by the way AlphaZero plays chess because I'm a chess player and it just