Joe Carlsmith
๐ค SpeakerAppearances Over Time
Podcast Appearances
Where does that fit into this picture?
I think it's a good question.
I mean, I think...
I think it's like some guess about like, if there's like no part of me that recognizes it as good, then I think I'm not sure that it's good according to me in some sense.
Like, so yeah, I mean, it is a question of like what it takes for it to be the case that a part of you recognizes it as good.
But I think if there's really none of that, then I'm not sure, yeah.
it's a reflection of my values at all.
Yeah, I mean, you definitely don't want to be like...
you know, if you transform me into a paper clipper gradually, then I will eventually be like, and then I saw the light.
I saw the true paper clips.
But that's part of what's complicated about this thing about reflection.
You have to find some way of differentiating between the sort of development processes that preserve what you care about and the development processes that don't.
And that in itself is this fraught question, which itself requires taking some stand on what you care about and what sorts of
meta processes you endorse and all sorts of things.
But you definitely shouldn't just be like, it is not a sufficient criteria that the thing at the end thinks it got it right.
Because that's compatible with having gone like wildly off the rails.
Yeah, so the context on that post is I'm talking about this hazy cluster, which I call in the essay, niceness slash liberalism slash boundaries, which is this sort of like somewhat more minimal set of like cooperative norms involved in like respecting the boundaries of others and kind of...
cooperation and peace amongst differences and like tolerance and stuff like that, as opposed to like your favorite structure of matter, which is sort of sometimes the paradigm of like values that people use in the context of AI risk.
And, you know, I talk for a while about the sort of ethical virtues of these like norms, but it's pretty clear that
Also, like, why do we have these norms?