Richard Sutton

So this is something we know very well, and the basis of it is temporal difference learning, where the same thing happens in a less grandiose scale, like when you learn to play chess.

1663.925 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

The long-term goal is winning the game, and yet you want to be able to learn from shorter-term things, like taking your opponent's pieces.

1675.938 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And so you do that by having a value function, which predicts the long-term outcome.

1687.232 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And then if you take the guy's pieces, well, your prediction about the long-term outcome is changed.

1692.158 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

It goes up.

1697.644 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

You think you're going to win.

1698.125 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And then that increase in your belief changes.

1699.366 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

immediately quote reinforces the uh the move that led to taking the piece okay so we have this long-term 10-year goal of making a startup and making a lot of money and so when we make progress we say oh i'm i'm i'm more likely to uh achieve the long-term goal and that rewards the the steps along the way

1702.55 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

I think the crux of this, and I'm not sure, but...

1762.804 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

The big world hypothesis seems very relevant, and the reason why humans become useful on their job is because they are encountering the particular part of the world, and it can't have been anticipated, and it can't all have been put in in advance.

1768.953 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

The world is so huge that you can't... The dream, as I see it, the dream of large language models is you can teach the agent everything and it will know everything and it won't have to learn anything online.

1787.161 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

right during its life okay and and your examples are all well really you have to because you can there's a lot to you can teach it but there's all little idiosyncrasies of the particular life they're leading and the the particular people they're working with and what they like as opposed to what average people like right and so that's just saying the world is really big and so you're going to have to learn it uh along the way

1802.247 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And I'm- So I would say you're just doing regular learning.

1860.473 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe using context, because in large language models, all that information has to go into the context window.

1865.36 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment