Daniel Kokotajlo
๐ค SpeakerAppearances Over Time
Podcast Appearances
And that one is one where, you know, you're still trying different things.
There's failure and success and experimentation.
And then there's another where it's like the thing has happened, and now you send the probe out, and then you look out at the night sky six months later, and you see something occluding the sun.
In your story, you have basically two different scenarios after some point.
So yeah, what is a sort of crucial turning point and what happens in these two scenarios?
So in the world where they're getting deployed through the economy, but they are misaligned.
And you, you're, you know, people in charge, at least at this moment, think that they are in a good position with regard to misalignment.
It just seems with even smart humans, they get caught in weird ways because they don't have logical omniscience.
They don't realize the consequences of the way they did something which just obviously gave them away.
And there is this โ with lying, there is this thing where it's just really hard to keep an inconsistent false world model alive.
working with the people around you, and that's why psychopaths often get caught.
And so if you have all these AIs that are deployed to the economy and they're all working towards this big conspiracy, I feel like one of them who's siloed or loses internet access and has to confabulate a story will just get caught, and then you're like, wait, what the fuck?
And then, you know, you catch it before it's, like, taken over the world.
So it is the case that certain things that people would have considered egregious misalignment in the past are happening.
But also certain things which people who are especially worried about misalignment said would be impossible to solve have just been solved in the normal course of getting more capabilities.
Like Eliezer had that thing about can you even specify what you want the AI to do without the AI totally misunderstanding you and then just converting the universe to paperclothes.
And now just by the nature of
GPT-4 having to understand natural language.
It totally has a common sense understanding of what you're trying to make it do, right?
So I think this sort of like trend cuts both ways, basically.