Stephen McAleese
๐ค SpeakerAppearances Over Time
Podcast Appearances
3.
The ASI alignment problem is hard because it has the properties of hard engineering challenges.
Based on the counterarguments above, I will now specify three counterarguments against AI alignment being difficult that aim to directly refute each of the three points above.
1.
Human values are not a fragile, tiny target, but a natural abstraction that intelligence tends to converge on.
Since models are trained on abundant human data using optimizers that favor generalization, we should expect them to acquire values as easily and reliably as they acquire other capabilities.
2.
Current training methods allow granular, parameter-level control via gradient descent unlike evolution.
Empirical evidence from modern LLMs demonstrates that these techniques successfully instill helpfulness and moral reasoning, proving that we can reliably shape AI behavior without relying on the clumsy indirectness of natural selection.
3.
Large neural networks are robust and forgiving systems and engineering analogies are misleading.
Unlike traditional engineering, AI models often become more robust and better at understanding human intent as they scale, making safety easier to achieve as capabilities increase.
Heading.
Conclusion.
In this book review, I have tried to summarize the arguments for and against its main beliefs in their strongest form, a form of deliberation ladder to help identify what's really true.
Though hopefully I haven't created a false balance which describes the views of both sides as equally valid even if one side has much stronger arguments.
While the book explores a variety of interesting ideas, this review focuses specifically on the expected difficulty of ASI alignment because I believe the author's belief that ASI alignment is difficult is the fundamental assumption underlying many of their other beliefs and recommendations.
Writing the summary of the book's main arguments initially left me confident that they were true.
However, after writing the counter-arguments sections I'm much less sure.
On balance, I find the book's main argument somewhat more convincing than the counter-arguments though I'm not sure.