Richard Sutton

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

1870.951 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But in a continual learning setup, it just goes into the weights.

1871.492 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe, yeah, so maybe context is the wrong word to use, because I mean a more general thing.

1875.179 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

You learn a policy that's specific to the environment that you're finding yourself in.

1879.046 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So maybe we're trying to ask the question of, it seems like the reward is too small of a thing to do all the learning that we need to do.

1899.212 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But, of course, we have the sensations, right?

1906.54 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

We have all the other information we can learn from.

1910.264 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

1912.987 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't just learn from the reward.

1914.028 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

We learn from all the data.

1915.249 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So now I want to talk about the base common model of the agent with the four parts.

1923.378 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

1930.386 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So we need a policy.

1931.006 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

The policy says...

1932.988 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

In the situation I'm in, what should I do?

1934.53 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

We need a value function.

1936.414 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

The value function is the thing that is learned with TD learning.

1938.097 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And the value function produces a number.

1941.845 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

The number says, how well is it going?

1943.608 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And then you watch if that's going up and down and use that to adjust your policy.

1946.354 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment