Dwarkesh Patel

👤 Speaker

15787 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 4

Confidence: High

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So it's performing the task people want, but at the same time, it's learning about the world from doing that task.

1536.745 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And do you imagine, okay, so we get rid of this paradigm where there's training periods and then there's deployment periods.

1543.915 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But then do we also get rid of this paradigm when there's the model and then instances of the model or copies of the model that are doing certain things?

1551.907 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

How do you think about the fact that we'd want this thing to be doing different things?

1562.207 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

We'd want to aggregate the knowledge that it's gaining from doing those different things.

1567.237 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

I agree that the kind of thing you're talking about is necessary regardless of whether you start from LLMs or not, right?

1621.212 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

If you want human or animal level intelligence, you're going to need this capability.

1630.027 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Suppose a human is trying to make a startup, right?

1634.795 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And this is a thing which has a reward on the order of 10 years.

1636.778 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Once in 10 years, you might have an exit where you get paid out a billion dollars.

1641.446 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

But humans have this ability to make intermediate auxiliary rewards or have some way of, even when they have extremely sparse rewards, they can still make intermediate steps, having an understanding of like what the next thing they're doing leads to this grander goal we have.

1644.972 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And so how do you imagine such a process might play out with AIs?

1659.974 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

right and then you also want some ability for information that you're learning i mean one of the things that makes humans quite different from these llms is that if you're onboarding on a job you're picking up so much context and information and that's what makes you useful at the job right you're uh everything from how your client as preferences to how the company works to everything

1724.943 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

And is the bandwidth of information that you get from a procedure like TD learning high enough to have this huge pipe of context and tacit knowledge that you'd need to be picking up in the way humans do when they're just deployed?

1747.741 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

1826.007 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

So it seems to me you need two things.

1827.008 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

One is some way of converting this long run goal reward into smaller auxiliary or, you know, these like predictive rewards of the future reward or the future reward, at least the final reward.

1828.17 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Then you need some other way.

1842.468 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Initially, it seems to me you need some way of then, OK, I'm

1843.369 View full episode →

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

I need to hold on to all this context that I'm gaining as I'm working in the world, right?