Joe Carlsmith
👤 PersonAppearances Over Time
Podcast Appearances
So our alignment work, control, cybersecurity, general epistemics, maybe some coordination applications, stuff like that.
There's a bunch of stuff you can do with AIs.
that in principle could kind of differentially accelerate our security with respect to the sorts of considerations we're talking about.
If you have the eyes that are capable of that, and you can successfully elicit that capability in a way that's not sort of being sabotaged or like messing with you in other ways, and they can't yet take over the world or do some other sort of really problematic form of power seeking, then I think
if we were really committed, we could like, you know, really go, go hard, put a ton of resources, really differentially direct this like glut of AI productivity towards these sort of security factors.
And, and hopefully kind of control and understand, you know, do a lot of these things you're talking about for kind of making sure our AIs don't kind of take over or mess with us in the meantime.
And I think we have a lot of tools there.
I think you have to
You have to really try though.
It's possible that those sorts of measures just don't happen or don't happen at the level of kind of commitment and diligence and like seriousness that you would need.
Especially if things are like moving really fast and there's other sort of competitive pressures and like, you know, the compute, ah, this is going to take compute to do these like intensive, all these experiments on the AIs and stuff and that compute, we could use that for experiments for the, you know, the next, the next scaling step and stuff like that.
So, yeah.
you know, I do, I am, I'm not here saying like, this is impossible, especially for that band of AIs.
It's just, I think you have to, you have to try really hard.
Yeah.
it is the case that by the time we're building super intelligence, we'll have like much better.
Uh, I mean, even right now, like when you, when you look at like labs talking about how they're planning to align the AIs, no one is saying like, we're going to do our LHF.
Um, you know, at the least you're talking about scalable oversight.
Um, you're, you have like some hope about, uh, interpretability.
You have like automated red teaming.