Joe Carlsmith
๐ค SpeakerAppearances Over Time
Podcast Appearances
I mean, look, it will be a very interesting fact if it's like, man, we keep training these AIs in all sorts of different ways.
Like, we're doing all this crazy stuff and they keep...
acting like bourgeois liberals.
It's like, wow.
They keep professing this weird alien reality.
They all converge on this one thing.
They're like, can't you see?
It's like Zorgle.
And all the AIs.
Interesting.
Very interesting.
I think my personal prediction is that that's not what we see.
And my actual prediction is that the AIs are going to be very malleable.
We're going to be like...
you know, if you push an AI towards evil, like it'll just go.
Um, and, and I think that's, um, uh, obviously, or sort of reflectively consistent evil.
I mean, I think there's also a question with some of these AIs.
It's like, um, uh,
will they even be consistent in their values, right?
I do think like a thing we can do, so I like this image of the blinded horses and I like this image of like maybe alignment is gonna mess with the, I think we should be really concerned if we're like forcing facts on our AIs, right?