Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, if you, uh, you know, modulo, like some difficulties with capabilities, like you can get a model to kind of output the behavior that you want.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

If it doesn't, then you, you crank it till it does.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Right.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And, um, and I think everyone admits for suitably sophisticated models, they're going to have very detailed understanding of human morality.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Um, uh,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

But the question is what relationship is there between a model's verbal behavior, which you've essentially kind of clamped.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You're like, the model must say blah things, and the criteria that end up influencing its choice between plans.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And there, I think it's at least, I'm kind of pretty cautious about being like, well, when it says the thing I forced it to say,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

or like, you know, gradient dissented it such that it says, that's a lot of evidence about like how it's going to choose in a bunch of different scenarios.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I mean, for one thing, like even with humans, right, it's not necessarily the case that humans, their kind of verbal behavior reflects the actual factors that determine their choices.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

They can lie.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

They can not even know what they would do in a given situation.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I mean,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, for folks who are kind of unfamiliar with the basic story, but maybe folks are like, wait, why are they taking over at all?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like, what is literally any reason that they would do that?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So, you know, the general concern is like...

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

you know, if you're really offering someone, especially if you're really offering someone like power for free, you know, power almost by definition is kind of useful for lots of values.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And if we're talking about an AI that really has the opportunity to kind of take control of things, if some component of its values is sort of focused on some outcome, like the world being a certain way, and especially kind of in a kind of longer term way, such that the kind of horizon of its concern extends beyond the period that the kind of takeover plan would encompass,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

then the thought is it's just kind of often the case that the world will be more the way you want it if you control everything than if you remain the instrument of control.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

the human will or of some other actor, which is sort of what we're hoping these AIs will be.