Joe Carlsmith

Joe Carlsmith - Otherness and control in the age of AGI

You know, maybe you don't despise paperclips, but there's like the human paperclippers there and they're like training you to make paperclips.

1429.388 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You know, my sense would be that there's a kind of relatively specific set of conditions in which you're comfortable having your value, especially not changed by like learning and growing, but like radiant descent directly intervening on your neurons.

1436.256 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So yes, it could be like that.

1497.653 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

There's one, there's a kind of scenario in which you were comfortable with your values being changed because in some sense you have allegiance to the, the sufficient allegiance to the output of that process.

1499.035 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like, so you're kind of hoping in a, in a religious context, you're like, ah, like make me more, uh, virtuous by the lights of this religion.

1509.106 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And you, you know, you go to confession and you're like, you know, uh, you know, I've been, I've been thinking about takeover today.

1517.335 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Can you change me, please?

1524.063 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like, give me more gradient descent.

1525.626 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You know, I've been bad, so bad.

1527.789 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And so, you know, people sometimes use the term corrigibility to talk about that.

1530.193 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like when the AI, it maybe doesn't have perfect values, but it's in some sense cooperating with your efforts to change its values to be a certain way.

1534.02 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So maybe it's worth saying a little bit here about what actual values the AI might have.

1541.553 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

You know, would it be the case that the AI...

1547.743 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

naturally has the sort of equivalent of like, I'm sufficiently devoted to this, um, human, to human obedience that I'm going to like really want to be modified.

1550.135 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

So I'm kind of like a better instrument of the human will, um, versus like wanting to go off and do my own thing.

1559.649 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, it could be, could be benign, you know, uh, could go well.

1564.796 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um,