Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Richard Sutton

👤 Person
505 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What would happen?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Learning and search have just won the day.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But there's a sense which that was not surprising to me because I was always voting for or hoping or rooting for the simple basic principles.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so even with the large language models, it's surprising how well it worked, but it was all good and gratifying.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And things like AlphaGo, it's sort of surprising how well that was able to work.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And AlphaZero in particular, how well it was able to work.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But it's all very gratifying because, again, it's simple basic principles are winning the day.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So the whole AlphaGo thing has a precursor, which is TD Gammon.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Jerry Tesoro did exactly that.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

reinforcement learning, temporal difference learning methods to play backgammon.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it beat the world's best players.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it worked really well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so in some sense, AlphaGo was merely a scaling up of that process.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It was quite a bit of scaling up, and there was also an additional innovation in how the search was done.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But it made sense.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It wasn't surprising in that sense.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

AlphaGo actually didn't use TD Learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It waited to see the final outcomes.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But AlphaZero used TD and AlphaZero was applied to all the other games and did extremely well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I've always been very impressed by the way AlphaZero plays chess because I'm a chess player and it just