Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stephen McAleese

๐Ÿ‘ค Speaker
449 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Throughout the book, the authors emphasize that they are not worried about bad actors abusing advanced AI systems, misuse, or programming an incorrect or naive objective into the AI, the outer alignment problem.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Instead, the authors believe that the problem facing humanity is that we can't aim an ASI at any goal at all, the inner alignment problem, let alone the narrow target of human values.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This is why they argue that if anyone builds it, everyone dies.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

It doesn't matter who builds the ASI, in any case whoever builds it won't be able to robustly instill any particular values into the AI and the AI will end up with alien and unfriendly values and will be a threat to everyone.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner alignment introduction.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The inner alignment problem involves two objectives.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

An outer objective used by a base optimizer and an inner objective used by an inner optimizer, also known as a Messer optimizer.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The outer objective is a loss or reward function that is specified by the programmers and used to train the AI model.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The base optimizer, such as gradient descent or reinforcement learning, searches over model parameters in order to find a model that performs well according to this outer objective on the training distribution.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The inner objective, by contrast, is the objective that a Messer optimizer within the trained model actually uses as its goal and determines its behavior.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This inner objective is not explicitly specified by the programmers.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Instead, it is selected by the outer objective as the model develops internal parameters that perform optimization or goal-directed behavior.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The inner alignment problem arises when the inner objective differs from the outer objective.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Even if a model achieves low loss or high reward during training, it may be doing so by optimizing a proxy objective that merely correlates with the outer objective on the training data.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

As a result, the model can behave as intended during training and evaluation while pursuing a different goal internally.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

We will call the problem of eliminating the base-messer objective gap the inner alignment problem, which we will contrast with the outer alignment problem of eliminating the gap between the base objective and the intended goal of the programmers.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Risks from learned optimization in advanced machine learning systems.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

End quote.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.