Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
it necessarily bites harder for getting the AIs to do alignment research than for getting the AIs to do anything else helpful.
Because if they have it out for you, they don't necessarily want to help you shore up your civilization's defenses.
So if you're imagining trying to get a hardened, misaligned AI to help you with biodefense, if it's misaligned and it, for example, wants...
the option of threatening you with a bioweapon in its arsenal in the future, it would similarly have an incentive to do a bad job at that as it would to do a bad job at alignment research.
In general, I think there's one big concern, which is, will the AIs that we're trying to use at that point in time have motivations that give them incentives to undermine the work we're trying to get them to do?
And I think they certainly would have incentives to undermine alignment research if they were misaligned.
But I think they would also have incentives to undermine like efforts to make ourselves more rational and thoughtful, like AI for epistemics.
Because if we're more rational and thoughtful, then maybe we'll realize they're probably misaligned and that would be bad for them.
They would also have incentive to undermine our like DEFAC style, like defensive efforts, because that would make it harder for them to take over.
So I do think that if you are not worried about alignment at this early stage, everything becomes easier.
It becomes an even more attractive strategy and path.
But I think the canonical using AI for AI safety or using AI for defense plan does imagine that we're not sure at the beginning.
that they're aligned.
We may not be highly confident that they're extremely misaligned and fully power-seeking and looking to take over at every opportunity, but we're not imagining that we know with confidence we can trust them.
So figuring out
how to create a setup where we use control techniques and alignment techniques and interpretability and whatever other tool at our disposal to get to the point where we feel good about relying on their outputs is a crucial step to figure out.
Because it either bottlenecks our progress, because we're checking on everything all the time and slowing things down, or it doesn't bottleneck our progress, but we hand the AIs the power to take over.
Yeah, so one obvious one is just AI alignment.
How can we ensure that either these AIs that we're using to help us right now or future generations of AIs that they help us create and future generations that those AIs help us to create, how can we ensure that that whole chain is aligned
motivated to help humans and is honest and is like basically doing what we say and steerable.