Richard Sutton

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

What would happen?

2600.211 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Learning and search have just won the day.

2602.376 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But there's a sense which that was not surprising to me because I was always voting for or hoping or rooting for the simple basic principles.

2605.143 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And so even with the large language models, it's surprising how well it worked, but it was all good and gratifying.

2613.996 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And things like AlphaGo, it's sort of surprising how well that was able to work.

2621.447 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And AlphaZero in particular, how well it was able to work.

2628.717 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But it's all very gratifying because, again, it's simple basic principles are winning the day.

2632.722 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So the whole AlphaGo thing has a precursor, which is TD Gammon.

2666.385 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Jerry Tesoro did exactly that.

2670.931 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

reinforcement learning, temporal difference learning methods to play backgammon.

2675.157 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And it beat the world's best players.

2681.226 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And it worked really well.

2684.811 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And so in some sense, AlphaGo was merely a scaling up of that process.

2685.772 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

It was quite a bit of scaling up, and there was also an additional innovation in how the search was done.

2691.44 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But it made sense.

2698.35 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

It wasn't surprising in that sense.

2699.572 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

AlphaGo actually didn't use TD Learning.

2701.735 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

It waited to see the final outcomes.

2706.581 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But AlphaZero used TD and AlphaZero was applied to all the other games and did extremely well.

2709.325 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

I've always been very impressed by the way AlphaZero plays chess because I'm a chess player and it just

2716.735 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment