Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stephen McAleese

๐Ÿ‘ค Speaker
449 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner misalignment evolution analogy.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The authors use an evolution analogy to explain the inner alignment problem in an intuitive way.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In their story there are two aliens that are trying to predict the preferences of humans after they have evolved.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

One alien argues that since evolution optimizes the genome of organisms for maximizing inclusive genetic fitness, that is survival and reproduction, humans will care only about that too and do things like only eating foods that are high in calories or nutrition, or only having sex if it leads to offspring.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The other alien, who is correct, predicts that humans will develop a variety of drives that are correlated with inclusive reproductive fitness, IGF, like liking tasty food and caring for loved ones but that they will value these drives only rather than IGF itself once they can understand it.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This alien is correct because once humans did finally understand IGF, we still did things like eating sucralose which is tasty but has no calories or having sex with contraception which is enjoyable but doesn't produce offspring.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Outer objective.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In this analogy, maximizing inclusive genetic fitness, IGF, is the base or outer objective of natural selection optimizing the human genome.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner objective.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The goals that humans actually have such as enjoying sweet foods or sex are the inner or messer objective.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

These proxy objectives are selected by the outer optimizer as one of many possible proxy objectives that lead to a high score on the outer objective in distribution but not in another environment.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Inner misalignment In this analogy, humans are inner misaligned because their true goals, inner objective, are different to the goals of natural selection, the outer objective.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In a different environment, for example the modern world, humans can score highly according to the inner objective, for example by having sex with contraception, but low according to IGF which is the outer objective, for example by not having kids.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading Real examples of inner misalignment

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Are there real-world examples of inner alignment failures?

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Yes.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Though unfortunately the book doesn't seem to mention these examples to support its argument.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In 2022, researchers created an environment in a game called Coin Run that rewarded an AI for going to a coin and collecting it but they always put the coin at the end of the level and the AI learned to go to the end of the level to get the coin.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

But when the researchers changed the environment so that the coin was randomly placed in the level, the AI still went to the end of the level and rarely got the coin.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Outer Objective In this example, going to the coin is the outer objective the AI is rewarded for.