Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

πŸ‘€ Speaker
15787 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 4
Confidence: High

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Some thoughts on the Sutton interview

Imitation learning is just short horizon RL.

Dwarkesh Podcast
Some thoughts on the Sutton interview

The episode is a token long.

Dwarkesh Podcast
Some thoughts on the Sutton interview

The LLM is making a conjecture about the next token based on its understanding of the world and how the different pieces of information in the sequence relate to each other.

Dwarkesh Podcast
Some thoughts on the Sutton interview

And it receives reward in proportion to how well it predicted the next token.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Now, of course, I already hear people saying, no, no, that's not the ground truth.

Dwarkesh Podcast
Some thoughts on the Sutton interview

It's just learning what a human was likely to say.

Dwarkesh Podcast
Some thoughts on the Sutton interview

And I agree.

Dwarkesh Podcast
Some thoughts on the Sutton interview

But there's a different question, which I think is actually more relevant to understanding the scalability of these models.

Dwarkesh Podcast
Some thoughts on the Sutton interview

And that question is, can we leverage this imitation learning to help models learn better from ground truth?

Dwarkesh Podcast
Some thoughts on the Sutton interview

And I think the answer is obviously yes.

Dwarkesh Podcast
Some thoughts on the Sutton interview

After RRLing these pre-trained base models, we've gotten them to win gold in international Math Olympiad competitions and to code up entire working applications from scratch.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Now, these are ground truth examinations.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Can you solve this unseen Math Olympiad question?

Dwarkesh Podcast
Some thoughts on the Sutton interview

Can you build this application to match the specific features request?

Dwarkesh Podcast
Some thoughts on the Sutton interview

But you couldn't have RL'd a model to accomplish these tasks from scratch, or at least we don't know how to do that yet.

Dwarkesh Podcast
Some thoughts on the Sutton interview

You needed a reasonable prior over human data in order to kickstart this RL process.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Whether you want to call this prior a proper world model or just a model of humans, I don't think is that important.

Dwarkesh Podcast
Some thoughts on the Sutton interview

It honestly seems like a semantic debate.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Because what you really care about is whether this model of humans has...

Dwarkesh Podcast
Some thoughts on the Sutton interview

helps you start learning from ground truth, aka become a true world model.