Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
That's one of many things making this difficult.
A larger thing making this difficult is that we do not know how to get any goals into systems at all.
We know how to get outwardly observable behaviors into systems.
We do not know how to get internal psychological wanting to do particular things into the system.
That is not what the current technology does.
We would have to get so far, so much further than we are now.
and further faster before that failure mode became a running concern.
Your failure modes are much more drastic, the ones you're controlling.
The failure modes are much simpler.
It's like, yeah, like the AI puts the universe into a particular state.
It happens to not have any humans inside it.
Okay, so the paperclip maximizer.
Utility, so the original version of the paperclip maximizer.
The original version was you lose control of the utility function, and it so happens that what maxes out the utility per unit resources is tiny molecular shapes like paperclips.
There's a lot of things that make it happy, but the cheapest one that didn't saturate was...
putting matter into certain shapes.
And it so happens that the cheapest way to make these shapes is to make them very small, because then you need fewer atoms, for instance, of the shape.
And arguendo, it happens to look like a paperclip.
In retrospect, I wish I'd said tiny molecular spirals.
Or like tiny molecular hyperbolic spirals.