Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

πŸ‘€ Speaker
15787 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 4
Confidence: High

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

This imitation learning has given us a good prior, given these models a good prior, but reasonable ways to approach problems.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And as we move towards the era of experience, as you call it, this prior is going to be the basis on which we teach these models from experience because this gives them the opportunity to get answers right some of the time.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then on this, you can build, you can train them on experience.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Do you agree with that perspective?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I mean, I think they do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can literally ask them, what would you anticipate a user might say in response?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And they have a prediction.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I think a capability like this does exist in context.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So it's interesting to watch a model do chain of thought, and then suppose it's trying to solve a math problem.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It'll say, okay, I'm going to approach this problem using this approach at first, and it'll write this out and be like, oh, wait, I just realized this is the wrong conceptual way to approach the problem.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I'm going to restart by this another approach.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And that flexibility is

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

does exist in context, right?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Do you have something else in mind, or do you just think that you need to extend this capability across longer horizons?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Isn't that literally what next token prediction is?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Prediction of what was next and then updating on the surprise?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Next token is what they should say, what the action should be.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Oh, yeah.