Richard Sutton

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Yep.

237.515 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so...

237.616 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Is there any way for it to tell, in the largest language model set up, to tell what's the right thing to say?

239.774 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

You will say something and you will not get feedback about what the right thing to say is because there's no definition of what the right thing to say is.

246.14 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no goal.

254.708 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And if there's no goal, then there's one thing to say, another thing to say.

256.189 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no right thing to say.

259.993 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So there's no ground truth.

262.155 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

You can't have prior knowledge if you don't have ground truth.

263.937 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Because the prior knowledge is supposed to be a hint or an initial belief about what the truth is.

267.5 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But there isn't any truth.

272.95 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no right thing to say.

275.353 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Now, in reinforcement learning, there is a right thing to say or a right thing to do because the right thing to do is the thing that gets you reward.

276.475 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So we have a definition of what the right thing to do is.

284.246 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And so we can have prior knowledge or knowledge provided by people about what the right thing to do is.

286.89 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And then we can check it to see because we have a definition of what the actual right thing to do is.

293.52 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Now, an even simpler case is when you're trying to make a model of the world.

298.803 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

When you predict what will happen, you predict and then you see what happens.

303.371 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so there's ground truth.

307.798 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no ground truth in large language models because you don't have a prediction about what will happen next.

309.602 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment