Stephen McAleese

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Throughout the book, the authors emphasize that they are not worried about bad actors abusing advanced AI systems, misuse, or programming an incorrect or naive objective into the AI, the outer alignment problem.

1005.798 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Instead, the authors believe that the problem facing humanity is that we can't aim an ASI at any goal at all, the inner alignment problem, let alone the narrow target of human values.

1018.311 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This is why they argue that if anyone builds it, everyone dies.

1029.082 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

It doesn't matter who builds the ASI, in any case whoever builds it won't be able to robustly instill any particular values into the AI and the AI will end up with alien and unfriendly values and will be a threat to everyone.

1033.439 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

1045.839 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner alignment introduction.

1047.261 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The inner alignment problem involves two objectives.

1049.545 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

An outer objective used by a base optimizer and an inner objective used by an inner optimizer, also known as a Messer optimizer.

1052.971 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The outer objective is a loss or reward function that is specified by the programmers and used to train the AI model.

1060.945 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The base optimizer, such as gradient descent or reinforcement learning, searches over model parameters in order to find a model that performs well according to this outer objective on the training distribution.

1067.901 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The inner objective, by contrast, is the objective that a Messer optimizer within the trained model actually uses as its goal and determines its behavior.

1079.305 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This inner objective is not explicitly specified by the programmers.

1088.179 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Instead, it is selected by the outer objective as the model develops internal parameters that perform optimization or goal-directed behavior.

1092.687 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The inner alignment problem arises when the inner objective differs from the outer objective.

1101.141 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Even if a model achieves low loss or high reward during training, it may be doing so by optimizing a proxy objective that merely correlates with the outer objective on the training data.

1106.204 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

As a result, the model can behave as intended during training and evaluation while pursuing a different goal internally.

1116.707 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

We will call the problem of eliminating the base-messer objective gap the inner alignment problem, which we will contrast with the outer alignment problem of eliminating the gap between the base objective and the intended goal of the programmers.

1125.307 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Risks from learned optimization in advanced machine learning systems.

1137.485 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

End quote.

1141.972 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

1143.554 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment