Ryan Kidd
๐ค SpeakerAppearances Over Time
Podcast Appearances
And people are like, we're never going to put it on the internet.
Who would do that?
It's crazy.
And now they're on the internet.
And notably, the world hasn't ended yet.
That's not to say it will stay that way.
You know, certainly a thing you don't want to do with the superintelligence is let it out of the box.
But...
Yeah, it does seem like we're in a better scenario than many imagined.
Now, there, of course, like we could be in the calm before the storm, right?
It might well be that there's what they call a sharp left turn or just a radical change in the way AIs internally kind of process information.
and they might acquire these kind of coherent long-run objectives.
I could point to like Matt's mentor Alex Turner's conception of shard theory as an example for how this might happen, right?
So like instead of AI systems being this, like you know, containing like a single MISA optimizer that is kind of coherently forming under training, right?
If you remember the old Evan Hubinger paradigm,
Your outer optimizer loop, which is training your AI system, causes it to develop an internal optimizer architecture, which then can have its own goals that differ quite a lot from the training objective.
And presumably, there's some counting arguments, such as there are arbitrarily many ways
to have this MISA optimizer form to produce the right outputs because this thing is clever.
And if its main goal is to produce paperclips or some other thing, then it's going to realize it's in a training process and it's going to give you the output you want no matter what its goal is.
We still could be in store for that kind of thing.