Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

intense adversariality between agents that have like somewhat different values.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Where there's some sort of thought, and I think this is rooted in the discourse about like kind of the fragility of value and stuff like that, that like, you know, if these agents are like somewhat different,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

then at least in the specific scenario of an AI takeoff, they end up in this intensely adversarial relationship.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And I think you're right to notice that that's kind of not how we are in the human world.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

We're very comfortable with a lot of differences in values.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I think a factor that is relevant and I think that plays some role is this notion that there are possibilities for intense concentration of power

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

on the table.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

There is some kind of general concern, both with humans and AIs, that if it's the case that there's some ring of power or something that someone can just grab and then that will give them huge amounts of power over everyone else, suddenly you might be more worried about differences in values at stake because you're more worried about those other actors.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So we talked about this Nazi, this example where you imagine that you wake up, you're being trained by Nazis to, you know, become a Nazi and you're not right now.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So one question is like, is it plausible that we'd end up with a model that is sort of in that sort of situation?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

As you said, like maybe it's, you know, it's trained as a kid.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

It sort of never ends up with values such that it's kind of aware of some significant divergence between its values and the values that like the humans intend for it to have.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

then there's a question of if it's in that scenario, would it want to avoid having its values modified?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

To me, it seems fairly plausible that if the AI's values meet certain constraints in terms of, like, do they care about consequences in the world?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Do they anticipate that the AI's kind of preserving its values will, like, better conduce to those consequences?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

then I think it's not that surprising if it prefers not to have its values modified by the training process.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Yeah, I mean, I think that's a reasonable point.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

I mean, there's a question, you know, how would you feel about paperclips?