Dwarkesh Patel

👤 Speaker

15656 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1

Confidence: Medium

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Some thoughts on the Sutton interview

An LLM that's being RL'd on outcome-based rewards learns on the order of one bit per episode, and an episode might be tens of thousands of tokens long.

515.418 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, obviously, animals and humans are clearly extracting more information from interacting with our environment than just the reward signal at the end of an episode.

523.831 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Conceptually, how should we think about what is happening with animals?

533.642 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I think we're learning to model the world through observations.

537.026 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

This outer loop RL is incentivizing some other learning system to pick up maximum signal from the environment.

540.31 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

In Richard's oak architecture, he calls this the transition model.

547.878 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And if we were trying to pigeonhole this feature spec into modern LLMs, what you do is fine tune on all your observed tokens.

551.667 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

From what I hear from my researcher friends, in practice, the most naive way of doing this actually doesn't work very well.

559.645 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, being able to learn from the environment in a high throughput way is obviously necessary for true AGI.

565.58 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And it clearly doesn't exist with LLMs trained on RLVR.

572.712 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

But there might be some other relatively straightforward ways to shoehorn continual learning atop LLMs.

577.04 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

For example, one could imagine making supervised fine tuning a tool call for the model.

581.948 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

So the outer loop RL is incentivizing the model to teach itself effectively using supervised learning in order to solve problems that don't fit in the context window.

586.615 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, I'm genuinely agnostic about how well techniques like this will work.

596.45 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I'm not an AI researcher.

599.554 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

but I wouldn't be surprised if they basically replicate continual learning.

601.237 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And the reason is that models are already demonstrating something resembling human continual learning within their context windows.

604.44 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

The fact that in-context learning emerged spontaneously from the training incentive to process long sequences makes me think that if information could just flow across windows longer than the context limit, then models could meta-learn the same flexibility that they already show in context.

611.527 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Okay, some concluding thoughts.

631.105 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Evolution does meta-RL to make an RL agent, and that agent can selectively do imitation learning.

633.49 View full episode →

← Previous Page 213 of 783 Next →

Report any issue

Dwarkesh Patel

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment