Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

πŸ‘€ Speaker
15656 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Some thoughts on the Sutton interview

This process is more analogous to imitation learning than it is to RL from scratch.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Now, of course, are we literally predicting the next token like an LLM would in order to do this cultural learning?

Dwarkesh Podcast
Some thoughts on the Sutton interview

No, of course not.

Dwarkesh Podcast
Some thoughts on the Sutton interview

So even the imitation learning that humans are doing is not like the supervised learning that we do for pre-training LLMs.

Dwarkesh Podcast
Some thoughts on the Sutton interview

But neither are we running around trying to collect some well-defined scale or reward.

Dwarkesh Podcast
Some thoughts on the Sutton interview

No ML learning regime perfectly describes human learning or animal learning.

Dwarkesh Podcast
Some thoughts on the Sutton interview

We're doing things which are both analogous to RL and to supervised learning.

Dwarkesh Podcast
Some thoughts on the Sutton interview

What planes are to birds, supervised learning might end up being to human cultural learning.

Dwarkesh Podcast
Some thoughts on the Sutton interview

I also don't think these learning techniques are actually categorically different.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Imitation learning is just short horizon RL.

Dwarkesh Podcast
Some thoughts on the Sutton interview

The episode is a token long.

Dwarkesh Podcast
Some thoughts on the Sutton interview

The LLM is making a conjecture about the next token based on its understanding of the world and how the different pieces of information in the sequence relate to each other.

Dwarkesh Podcast
Some thoughts on the Sutton interview

And it receives reward in proportion to how well it predicted the next token.

Dwarkesh Podcast
Some thoughts on the Sutton interview

Now, of course, I already hear people saying, no, no, that's not the ground truth.

Dwarkesh Podcast
Some thoughts on the Sutton interview

It's just learning what a human was likely to say.

Dwarkesh Podcast
Some thoughts on the Sutton interview

And I agree.

Dwarkesh Podcast
Some thoughts on the Sutton interview

But there's a different question, which I think is actually more relevant to understanding the scalability of these models.

Dwarkesh Podcast
Some thoughts on the Sutton interview

And that question is, can we leverage this imitation learning to help models learn better from ground truth?

Dwarkesh Podcast
Some thoughts on the Sutton interview

And I think the answer is obviously yes.

Dwarkesh Podcast
Some thoughts on the Sutton interview

After RRLing these pre-trained base models, we've gotten them to win gold in international Math Olympiad competitions and to code up entire working applications from scratch.