Joe Carlsmith

Joe Carlsmith - Otherness and control in the age of AGI

You know, if you, uh, you know, modulo, like some difficulties with capabilities, like you can get a model to kind of output the behavior that you want.

142.465 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

If it doesn't, then you, you crank it till it does.

149.195 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Right.

151.278 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And, um, and I think everyone admits for suitably sophisticated models, they're going to have very detailed understanding of human morality.

152.099 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, uh,

159.329 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

But the question is what relationship is there between a model's verbal behavior, which you've essentially kind of clamped.

161.845 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You're like, the model must say blah things, and the criteria that end up influencing its choice between plans.

169.353 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And there, I think it's at least, I'm kind of pretty cautious about being like, well, when it says the thing I forced it to say,

179.384 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

or like, you know, gradient dissented it such that it says, that's a lot of evidence about like how it's going to choose in a bunch of different scenarios.

188.761 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I mean, for one thing, like even with humans, right, it's not necessarily the case that humans, their kind of verbal behavior reflects the actual factors that determine their choices.

196.593 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

They can lie.

205.026 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

They can not even know what they would do in a given situation.

206.107 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I mean,

208.832 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You know, for folks who are kind of unfamiliar with the basic story, but maybe folks are like, wait, why are they taking over at all?

241.505 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like, what is literally any reason that they would do that?

246.316 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So, you know, the general concern is like...

249.103 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

you know, if you're really offering someone, especially if you're really offering someone like power for free, you know, power almost by definition is kind of useful for lots of values.

252.548 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And if we're talking about an AI that really has the opportunity to kind of take control of things, if some component of its values is sort of focused on some outcome, like the world being a certain way, and especially kind of in a kind of longer term way, such that the kind of horizon of its concern extends beyond the period that the kind of takeover plan would encompass,

261.16 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

then the thought is it's just kind of often the case that the world will be more the way you want it if you control everything than if you remain the instrument of control.

281.59 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

the human will or of some other actor, which is sort of what we're hoping these AIs will be.

295.529 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment