Dwarkesh Patel

👤 Speaker

15787 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 4

Confidence: High

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Some thoughts on the Sutton interview

Imitation learning is just short horizon RL.

369.377 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

The episode is a token long.

372.481 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

The LLM is making a conjecture about the next token based on its understanding of the world and how the different pieces of information in the sequence relate to each other.

374.003 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And it receives reward in proportion to how well it predicted the next token.

382.254 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, of course, I already hear people saying, no, no, that's not the ground truth.

386.03 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

It's just learning what a human was likely to say.

389.456 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And I agree.

391.981 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

But there's a different question, which I think is actually more relevant to understanding the scalability of these models.

393.424 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And that question is, can we leverage this imitation learning to help models learn better from ground truth?

399.235 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And I think the answer is obviously yes.

405.807 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

After RRLing these pre-trained base models, we've gotten them to win gold in international Math Olympiad competitions and to code up entire working applications from scratch.

408.772 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, these are ground truth examinations.

418.549 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Can you solve this unseen Math Olympiad question?

421.675 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Can you build this application to match the specific features request?