Joe Carlsmith

Joe Carlsmith - Otherness and control in the age of AGI

intense adversariality between agents that have like somewhat different values.

Joe Carlsmith - Otherness and control in the age of AGI

Where there's some sort of thought, and I think this is rooted in the discourse about like kind of the fragility of value and stuff like that, that like, you know, if these agents are like somewhat different,

1271.904 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

then at least in the specific scenario of an AI takeoff, they end up in this intensely adversarial relationship.

1279.763 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And I think you're right to notice that that's kind of not how we are in the human world.

1288.274 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

We're very comfortable with a lot of differences in values.

1292.42 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I think a factor that is relevant and I think that plays some role is this notion that there are possibilities for intense concentration of power

1295.203 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

on the table.

1305.667 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

There is some kind of general concern, both with humans and AIs, that if it's the case that there's some ring of power or something that someone can just grab and then that will give them huge amounts of power over everyone else, suddenly you might be more worried about differences in values at stake because you're more worried about those other actors.

1309.092 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So we talked about this Nazi, this example where you imagine that you wake up, you're being trained by Nazis to, you know, become a Nazi and you're not right now.

1331.182 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So one question is like, is it plausible that we'd end up with a model that is sort of in that sort of situation?

1340.813 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

As you said, like maybe it's, you know, it's trained as a kid.

1347.862 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

It sort of never ends up with values such that it's kind of aware of some significant divergence between its values and the values that like the humans intend for it to have.

1350.144 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

then there's a question of if it's in that scenario, would it want to avoid having its values modified?

1361.202 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

1369.856 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

To me, it seems fairly plausible that if the AI's values meet certain constraints in terms of, like, do they care about consequences in the world?