Joe Carlsmith
๐ค SpeakerAppearances Over Time
Podcast Appearances
When we think about, like, in which context is it appropriate to try to exert various types of control or to kind of have more of what I call in the series yang, which is this kind of active kind of controlling force, as opposed to yin, which is this more kind of receptive, open, letting go concept.
A kind of paradigm context in which we think that is appropriate is if something is a kind of active aggressor against the sort of boundaries and cooperative structures that we've created as a civilization, right?
So...
I talk about the Nazis or in the piece, it's sort of like when you sort of invade, if something is invading, we often think it's appropriate to fight back, right?
And we often think it's appropriate to set up structures to kind of prevent and kind of ensure that these basic norms of kind of peace and harmony are kind of adhered to.
And I do think some of the kind of moral heft
of some parts of the alignment discourse comes from drawing specifically on that aspect of our morality, right?
So we think the AIs are presented as aggressors that are coming to kill you.
And if that's true, then it's quite appropriate, I think, to really be like, okay,
It is kind of... That's classic human stuff.
Almost everyone recognizes that kind of self-defense or ensuring kind of basic norms are adhered to is a kind of justified use of certain kinds of power that would often be unjustified in other contexts.
So self-defense is a clear example there.
I do think it's important, though, to separate that concern from this other concern about...
where does the future eventually go?
And how much do we want to be kind of trying to steer that actively?
So to some extent, I wrote the series partly in response to the thing you're talking about, which is, I think it is true that aspects of this discourse involve the possibility of like
trying to grip, like, I think trying to kind of steer and grip and like kind of rent, you have the sense of the universe is about to kind of go off in some direction and you need to, and you know, people notice that muscle.
And part of what I want to do is like, well, we have a very rich ethical, human ethical tradition of thinking about like, what, when is it appropriate to try to exert what sorts of control over which things?
And I want that to be, I want us to bring the kind of full force and richness of that tradition to this discussion, right?
And not, like, I think it's easy if you're purely in this abstract mode of like utility functions, like human utility function, and there's like this competitor thing with utility function.