Stephen McAleese

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner misalignment evolution analogy.

1144.956 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The authors use an evolution analogy to explain the inner alignment problem in an intuitive way.

1147.741 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In their story there are two aliens that are trying to predict the preferences of humans after they have evolved.

1153.353 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

One alien argues that since evolution optimizes the genome of organisms for maximizing inclusive genetic fitness, that is survival and reproduction, humans will care only about that too and do things like only eating foods that are high in calories or nutrition, or only having sex if it leads to offspring.

1159.461 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The other alien, who is correct, predicts that humans will develop a variety of drives that are correlated with inclusive reproductive fitness, IGF, like liking tasty food and caring for loved ones but that they will value these drives only rather than IGF itself once they can understand it.

1176.725 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This alien is correct because once humans did finally understand IGF, we still did things like eating sucralose which is tasty but has no calories or having sex with contraception which is enjoyable but doesn't produce offspring.

1193.444 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Outer objective.

1206.728 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In this analogy, maximizing inclusive genetic fitness, IGF, is the base or outer objective of natural selection optimizing the human genome.

1208.77 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner objective.

1218.72 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The goals that humans actually have such as enjoying sweet foods or sex are the inner or messer objective.

1220.561 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

These proxy objectives are selected by the outer optimizer as one of many possible proxy objectives that lead to a high score on the outer objective in distribution but not in another environment.

1226.687 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner misalignment In this analogy, humans are inner misaligned because their true goals, inner objective, are different to the goals of natural selection, the outer objective.

1237.998 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In a different environment, for example the modern world, humans can score highly according to the inner objective, for example by having sex with contraception, but low according to IGF which is the outer objective, for example by not having kids.

1249.132 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading Real examples of inner misalignment

1263.851 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Are there real-world examples of inner alignment failures?

1268.052 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Yes.

1271.784 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Though unfortunately the book doesn't seem to mention these examples to support its argument.

1272.105 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In 2022, researchers created an environment in a game called Coin Run that rewarded an AI for going to a coin and collecting it but they always put the coin at the end of the level and the AI learned to go to the end of the level to get the coin.

1278.346 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

But when the researchers changed the environment so that the coin was randomly placed in the level, the AI still went to the end of the level and rarely got the coin.

1291.362 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Outer Objective In this example, going to the coin is the outer objective the AI is rewarded for.

1300.573 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment