Stephen McAleese
๐ค SpeakerAppearances Over Time
Podcast Appearances
The preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained.
End quote.
One thing that's initially puzzling about the author's view is their apparent overconfidence.
If you don't know what's going to happen then how can you predict the outcome with high confidence?
But it's still possible to be highly confident in an uncertain situation if you have the right prior.
For example, even though you have no idea what the lottery number in a lottery is, you can predict with high confidence that you won't win the lottery because your prior probability of winning is so low.
The authors also believe that the AI alignment problem has accurses similar to other hard engineering problems like launching a space probe, building a nuclear reactor safely, and building a secure computer system.
Subheading 1.
Human values are a very specific, fragile, and tiny space of all possible goals.
One reason why AI alignment is difficult is that human morality and values may be a complex, fragile, and tiny target within the vast space of all possible goals.
Therefore, AI alignment engineers have a small target to hit.
Just as randomly shuffling metal parts is statistically unlikely to assemble a Boeing 747, a randomly selected goal from the space of all possible intelligences is unlikely to be compatible with human flourishing or survival, for example maximizing the number of paperclips in the universe.
This intuition is also articulated in the blog post The Rocket Alignment Problem which compares AI alignment to the problem of landing a rocket on the moon.
Both require deep understanding of the problem and precise engineering to hit a narrow target.
Similarly, the authors argue that human values are fragile.
The loss of just a few key values like subjective experience or novelty could result in a future that seems dystopian and undesirable to us.
Or the converse problem, an agent that contains all the aspects of human value, except the valuation of subjective experience.
so that the result is a non-sentient optimizer that goes around making genuine discoveries, but the discoveries are not savored and enjoyed because there is no one there to do so.
This, I admit, I don't quite know to be possible.
Consciousness does still confuse me to some extent.