Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stephen McAleese

๐Ÿ‘ค Speaker
449 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The post gives two arguments for why AI models such as LLMs are likely to easily acquire human values.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

1.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Values are pervasive in language model pre-training datasets such as books and conversations between people.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

2.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Since values are shared and understood by almost everyone in a society, they cannot be very complex.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Similarly, the post-Why I'm Optimistic About Our Alignment approach uses evidence about LLMs as a reason to believe that solving the AI alignment problem is achievable using current methods.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Quote

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Large language models, LLMs, make this a lot easier.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

They come preloaded with a lot of humanity's knowledge, including detailed knowledge about human preferences and values.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Out of the box they aren't agents who are trying to pursue their own goals in the world, and their objective functions are quite malleable.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

For example, they are surprisingly easy to train to behave more nicely.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

End quote.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

A more theoretical argument called alignment by default offers an explanation for how AIs could easily and robustly acquire human values.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This argument suggests that as an AI identifies patterns in human text, it doesn't just learn facts about values, but adopts human values as a natural abstraction.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

A natural abstraction is a high-level concept, for example trees, people, or fairness, that different learning algorithms tend to converge upon because it efficiently summarizes a large amount of low-level data.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

If human value is a natural abstraction, then any sufficiently advanced intelligence might naturally gravitate toward understanding and representing our values in a robust and generalizing way as a byproduct of learning to understand the world.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The evidence LLMs offer about the tractability of AI alignment seems compelling and concrete.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

However, the arguments of IABID are focused on the difficulty of aligning ASI, not contemporary LLMs, and the difficulty of aligning ASI could be vastly more difficult.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Arguments against engineering analogies to AI alignment.