Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Joe Carlsmith

👤 Person
1218 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, maybe you don't despise paperclips, but there's like the human paperclippers there and they're like training you to make paperclips.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, my sense would be that there's a kind of relatively specific set of conditions in which you're comfortable having your value, especially not changed by like learning and growing, but like radiant descent directly intervening on your neurons.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So yes, it could be like that.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

There's one, there's a kind of scenario in which you were comfortable with your values being changed because in some sense you have allegiance to the, the sufficient allegiance to the output of that process.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like, so you're kind of hoping in a, in a religious context, you're like, ah, like make me more, uh, virtuous by the lights of this religion.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And you, you know, you go to confession and you're like, you know, uh, you know, I've been, I've been thinking about takeover today.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Can you change me, please?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like, give me more gradient descent.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, I've been bad, so bad.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And so, you know, people sometimes use the term corrigibility to talk about that.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Like when the AI, it maybe doesn't have perfect values, but it's in some sense cooperating with your efforts to change its values to be a certain way.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So maybe it's worth saying a little bit here about what actual values the AI might have.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

You know, would it be the case that the AI...

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

naturally has the sort of equivalent of like, I'm sufficiently devoted to this, um, human, to human obedience that I'm going to like really want to be modified.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

So I'm kind of like a better instrument of the human will, um, versus like wanting to go off and do my own thing.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Um, it could be, could be benign, you know, uh, could go well.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

Um,

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

here are some like possibilities I think about that like could make it bad.

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

And I think I'm just generally kind of concerned about how little I feel like I, how little science we have of model motivations, right?

Dwarkesh Podcast
Joe Carlsmith - Otherness and control in the age of AGI

It's like, we just don't, I think we just don't have a great understanding of what happens in the scenario.