Joe Carlsmith
๐ค SpeakerAppearances Over Time
Podcast Appearances
so we don't want to like chain civilization to like a barbarous past or whatever.
Like everyone should agree on that, including, and the people who are interested in alignment also agree on that.
Um, so, uh, obviously there's a concern that people like don't engage in that process or that something shuts down the process of reflection.
But I think everyone agrees we want that.
And so that will lead potentially to something that is quite different from our, uh, current conception of what's, what's valuable.
Um, and, uh,
There's a question of how different.
And I think there are also questions about what exactly are we talking about with reflection?
I have an essay on this where I think this is not... I don't actually think there's a kind of off-the-shelf, pre-normative notion of reflection that you can just be like, oh, obviously you take an agent, you stick it through reflection, and then you get values, right?
Like, no.
There's a bunch of...
types of reflect.
I mean, I think that really there's just a bunch of, there's like a whole pattern of empirical facts about like take an agent, put it through some process of like reflection, all sorts of things, ask it questions.
There's like also, and then that'll go in all sorts of directions for a given empirical case.
And then you have to look at the pattern of outputs and be like, okay, what do I make of that?
Um, but overall I think we should expect like even the good futures I think will be quite weird.
Um, and they might even be incomprehensible.
Like,
to us i don't i don't think so like so i mean there's different types of incomprehensible so say i show up in the in the future and this is all computers right i'm like okay all right and then they're like we're up we ran we're running like creatures on the computers i'm like so i have to somehow get in there and see like what's actually going on with the computers or something like that maybe i can actually see maybe i actually understand what's going on in the computers but i don't yet know what values i should be using to evaluate that so it can be the case that you don't
us if we showed up would not be very good at like recognizing goodness or badness.