Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stephen McAleese

๐Ÿ‘ค Speaker
449 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

3.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The ASI alignment problem is hard because it has the properties of hard engineering challenges.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Based on the counterarguments above, I will now specify three counterarguments against AI alignment being difficult that aim to directly refute each of the three points above.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

1.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Human values are not a fragile, tiny target, but a natural abstraction that intelligence tends to converge on.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Since models are trained on abundant human data using optimizers that favor generalization, we should expect them to acquire values as easily and reliably as they acquire other capabilities.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

2.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Current training methods allow granular, parameter-level control via gradient descent unlike evolution.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Empirical evidence from modern LLMs demonstrates that these techniques successfully instill helpfulness and moral reasoning, proving that we can reliably shape AI behavior without relying on the clumsy indirectness of natural selection.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

3.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Large neural networks are robust and forgiving systems and engineering analogies are misleading.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Unlike traditional engineering, AI models often become more robust and better at understanding human intent as they scale, making safety easier to achieve as capabilities increase.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Heading.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Conclusion.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In this book review, I have tried to summarize the arguments for and against its main beliefs in their strongest form, a form of deliberation ladder to help identify what's really true.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Though hopefully I haven't created a false balance which describes the views of both sides as equally valid even if one side has much stronger arguments.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

While the book explores a variety of interesting ideas, this review focuses specifically on the expected difficulty of ASI alignment because I believe the author's belief that ASI alignment is difficult is the fundamental assumption underlying many of their other beliefs and recommendations.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Writing the summary of the book's main arguments initially left me confident that they were true.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

However, after writing the counter-arguments sections I'm much less sure.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

On balance, I find the book's main argument somewhat more convincing than the counter-arguments though I'm not sure.