Dwarkesh Patel

👤 Speaker

15787 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 4

Confidence: High

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Some thoughts on the Sutton interview

And it clearly doesn't exist with LLMs trained on RLVR.

572.712 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

But there might be some other relatively straightforward ways to shoehorn continual learning atop LLMs.

577.04 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

For example, one could imagine making supervised fine tuning a tool call for the model.

581.948 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

So the outer loop RL is incentivizing the model to teach itself effectively using supervised learning in order to solve problems that don't fit in the context window.

586.615 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Now, I'm genuinely agnostic about how well techniques like this will work.

596.45 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

I'm not an AI researcher.

599.554 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

but I wouldn't be surprised if they basically replicate continual learning.

601.237 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And the reason is that models are already demonstrating something resembling human continual learning within their context windows.

604.44 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

The fact that in-context learning emerged spontaneously from the training incentive to process long sequences makes me think that if information could just flow across windows longer than the context limit, then models could meta-learn the same flexibility that they already show in context.

611.527 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Okay, some concluding thoughts.

631.105 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Evolution does meta-RL to make an RL agent, and that agent can selectively do imitation learning.

633.49 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

With LLMs, we're going the opposite way.

639.703 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

We have first made this base model that does pure imitation learning, and then we're hoping that we do enough RL on it to make a coherent agent with goals and self-awareness.

641.567 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Maybe this won't work.

650.967 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

But I don't think these super first principles arguments about, for example, how these LMs don't have a true world model are actually proving much.

652.409 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And I also don't think they're strictly accurate for the models we have today, which are actually undergoing a lot of RL on ground truth.

659.94 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

Even if Sutton's platonic ideal doesn't end up being the path to the first AGI,

666.529 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

His first principles critique is identifying some genuine basic gaps that these models have.

670.795 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

And we don't even notice them because they're so pervasive in the current paradigm, but because he has this decades-long perspective, they're obvious to him.

676.143 View full episode →

Dwarkesh Podcast

Some thoughts on the Sutton interview

It's the lack of continual learning.

683.193 View full episode →

← Previous Page 220 of 790 Next →

Report any issue

Dwarkesh Patel

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment