Joe Carlsmith

I have an essay on this where I think this is not... I don't actually think there's a kind of off-the-shelf, pre-normative notion of reflection that you can just be like, oh, obviously you take an agent, you stick it through reflection, and then you get values, right?

3920.608 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like, no.

3933.689 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

There's a bunch of...

3935.592 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

types of reflect.

3937.34 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

I mean, I think that really there's just a bunch of, there's like a whole pattern of empirical facts about like take an agent, put it through some process of like reflection, all sorts of things, ask it questions.

3938.141 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

There's like also, and then that'll go in all sorts of directions for a given empirical case.

3947.171 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

And then you have to look at the pattern of outputs and be like, okay, what do I make of that?

3951.375 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, but overall I think we should expect like even the good futures I think will be quite weird.

3955.4 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Um, and they might even be incomprehensible.

3961.567 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

Like,

3965.591 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

to us i don't i don't think so like so i mean there's different types of incomprehensible so say i show up in the in the future and this is all computers right i'm like okay all right and then they're like we're up we ran we're running like creatures on the computers i'm like so i have to somehow get in there and see like what's actually going on with the computers or something like that maybe i can actually see maybe i actually understand what's going on in the computers but i don't yet know what values i should be using to evaluate that so it can be the case that you don't

3966.517 View full episode →

Dwarkesh Podcast

Joe Carlsmith - Otherness and control in the age of AGI

us if we showed up would not be very good at like recognizing goodness or badness.

3991.424 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment