Joe Carlsmith
👤 PersonAppearances Over Time
Podcast Appearances
The runs were structurally quite similar.
Everyone was using the same techniques.
Maybe someone just stole the weights.
So
Yeah, I guess I think it's really important, this idea that to the extent you haven't solved alignment, you likely haven't solved it anywhere.
And if someone has solved it and someone hasn't, then I think it's a better question.
But if everyone's building systems that are going to go rogue, then I don't think that's much comfort, as we talked about.
Yeah, I mean, I'll just say on that front, I mean, I do think the otherness and control series is...
you know, I think kind of in some sense separable.
I mean, it has a lot, it has a lot to do with like misalignment stuff, but I think it's not.
I think a lot of those issues are relevant, even if, if even given various degrees of skepticism about some of the stuff I've been saying here.
Yeah.
I think like, yeah, in terms of,
why is it possible that I guys could take over from a given position in one of these projects I've been describing or something.
I think Carl's discussion is pretty good and gets into a bunch of kind of the weeds that I think might give a more concrete sense.
One scenario I think about a lot is one in which it just turns out that maybe kind of fairly basic measures are enough to ensure, for example, that AIs don't cause catastrophic harm, don't kind of seek power in problematic ways, etc.,
And it could turn out that we learned that it was easy in a way that, such that we regret, you know, we wish we had prioritized differently.
We end up thinking, oh, you know, I wish we could have cured cancer sooner.
We could have handled some geopolitical dynamic differently.
There's another scenario where we end up looking back at some period of our history and how we