Joe Carlsmith
๐ค SpeakerAppearances Over Time
Podcast Appearances
But that's a kind of empirical claim.
I'm also just, like, kind of low on this, like, everyone converges thing.
So, you know, if you imagine, like,
you train a chess-playing AI or you have a real paper clipper, right?
Somehow you had a real paper clipper.
And then you're like, okay, go and reflect.
Based on my understanding of how moral reasoning works, if you look at the type of moral reasoning that analytic ethicists do, it's just reflective equilibrium, right?
They just take their intuitions and they systematize them.
I don't see how that process gets a sort of injection
of like the kind of mind independent moral truth, or like, I guess it, like if you sort of start with like only all of your intuition say to maximize paperclips, I don't see how you end up maximizing or like doing some like rich human morality.
I just don't like, it doesn't look to me like that's how human ethical reasoning works.
I think like most of what normative philosophy does is make consistent and kind of systematize pre-theoretic intuitions.
And so,
And I think, but we'll get evidence about this.
Like, you know, in some sense, I think this view predicts like, you know, you keep trying to train the AIs to like do something and they keep being like, no, I'm not going to.
like do that.
It's like, no, that's not good.
Or so they keep like pushing back.
Like the sort of momentum of like AI cognition is like always in the direction of this like moral truth.
And whenever we like try to push it in some other direction, we'll find kind of resistance from like the rational structure of things.