Stephen McAleese
๐ค SpeakerAppearances Over Time
Podcast Appearances
Throughout the book, the authors emphasize that they are not worried about bad actors abusing advanced AI systems, misuse, or programming an incorrect or naive objective into the AI, the outer alignment problem.
Instead, the authors believe that the problem facing humanity is that we can't aim an ASI at any goal at all, the inner alignment problem, let alone the narrow target of human values.
This is why they argue that if anyone builds it, everyone dies.
It doesn't matter who builds the ASI, in any case whoever builds it won't be able to robustly instill any particular values into the AI and the AI will end up with alien and unfriendly values and will be a threat to everyone.
Subheading.
Inner alignment introduction.
The inner alignment problem involves two objectives.
An outer objective used by a base optimizer and an inner objective used by an inner optimizer, also known as a Messer optimizer.
The outer objective is a loss or reward function that is specified by the programmers and used to train the AI model.
The base optimizer, such as gradient descent or reinforcement learning, searches over model parameters in order to find a model that performs well according to this outer objective on the training distribution.
The inner objective, by contrast, is the objective that a Messer optimizer within the trained model actually uses as its goal and determines its behavior.
This inner objective is not explicitly specified by the programmers.
Instead, it is selected by the outer objective as the model develops internal parameters that perform optimization or goal-directed behavior.
The inner alignment problem arises when the inner objective differs from the outer objective.
Even if a model achieves low loss or high reward during training, it may be doing so by optimizing a proxy objective that merely correlates with the outer objective on the training data.
As a result, the model can behave as intended during training and evaluation while pursuing a different goal internally.
We will call the problem of eliminating the base-messer objective gap the inner alignment problem, which we will contrast with the outer alignment problem of eliminating the gap between the base objective and the intended goal of the programmers.
Risks from learned optimization in advanced machine learning systems.
End quote.
Subheading.