Dwarkesh Patel

The LLM is making a conjecture about the next token based on its understanding of the world and how the different pieces of information in the sequence relate to each other.

374.003 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And it receives reward in proportion to how well it predicted the next token.

382.254 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, of course, I already hear people saying, no, no, that's not the ground truth.

386.03 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

It's just learning what a human was likely to say.

389.456 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And I agree.

391.981 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

But there's a different question, which I think is actually more relevant to understanding the scalability of these models.

393.424 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And that question is, can we leverage this imitation learning to help models learn better from ground truth?

399.235 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And I think the answer is obviously yes.

405.807 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

After RRLing these pre-trained base models, we've gotten them to win gold in international Math Olympiad competitions and to code up entire working applications from scratch.

408.772 View full episode →

← Previous Page 211 of 783 Next →

Report any issue

Dwarkesh Patel

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment