Dwarkesh Patel

👤 Speaker

15787 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 4

Confidence: High

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Some thoughts on the Sutton interview

It's a bit like saying to somebody pasteurizing milk, hey, you should stop boiling that milk because eventually you want to serve it cold.

458.535 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Of course, but this is an intermediate step to facilitate the final output.

464.284 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

By the way, LLMs are clearly developing a deep representation of the world because their training process is incentivizing them to develop one.

470.017 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I use LLMs to teach me about everything from biology to AI to history, and they are able to do so with remarkable flexibility and coherence.

477.867 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, are LLMs specifically trained to model how their actions will affect the world?

485.957 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

No, they are not.

491.403 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

But if we're not allowed to call their representations a world model,

492.705 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

then we're defining the term world model by the process that we think is necessary to build one, rather than the obvious capabilities that this concept implies.

496.73 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Okay, continual learning.

506.685 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I'm sorry to bring up my hobby horse again.

507.747 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I'm like a comedian who has only come up with one good bit, but I'm going to milk it for all it's worth.

510.25 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

An LLM that's being RL'd on outcome-based rewards learns on the order of one bit per episode, and an episode might be tens of thousands of tokens long.

515.418 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, obviously, animals and humans are clearly extracting more information from interacting with our environment than just the reward signal at the end of an episode.

523.831 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Conceptually, how should we think about what is happening with animals?

533.642 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I think we're learning to model the world through observations.

537.026 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

This outer loop RL is incentivizing some other learning system to pick up maximum signal from the environment.