Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Richard Sutton

👤 Person
505 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yep.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Is there any way for it to tell, in the largest language model set up, to tell what's the right thing to say?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You will say something and you will not get feedback about what the right thing to say is because there's no definition of what the right thing to say is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if there's no goal, then there's one thing to say, another thing to say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no right thing to say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So there's no ground truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can't have prior knowledge if you don't have ground truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because the prior knowledge is supposed to be a hint or an initial belief about what the truth is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But there isn't any truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no right thing to say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Now, in reinforcement learning, there is a right thing to say or a right thing to do because the right thing to do is the thing that gets you reward.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So we have a definition of what the right thing to do is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so we can have prior knowledge or knowledge provided by people about what the right thing to do is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then we can check it to see because we have a definition of what the actual right thing to do is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Now, an even simpler case is when you're trying to make a model of the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

When you predict what will happen, you predict and then you see what happens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so there's ground truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no ground truth in large language models because you don't have a prediction about what will happen next.