Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Richard Sutton

👤 Person
505 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But in a continual learning setup, it just goes into the weights.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe, yeah, so maybe context is the wrong word to use, because I mean a more general thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You learn a policy that's specific to the environment that you're finding yourself in.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So maybe we're trying to ask the question of, it seems like the reward is too small of a thing to do all the learning that we need to do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But, of course, we have the sensations, right?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We have all the other information we can learn from.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't just learn from the reward.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We learn from all the data.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So now I want to talk about the base common model of the agent with the four parts.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So we need a policy.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The policy says...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

In the situation I'm in, what should I do?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We need a value function.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The value function is the thing that is learned with TD learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And the value function produces a number.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The number says, how well is it going?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then you watch if that's going up and down and use that to adjust your policy.