Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stephen McAleese

๐Ÿ‘ค Speaker
449 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

One of the book's arguments for why ASI alignment would be difficult is that ASI alignment is a high-stakes engineering challenge similar to other difficult historical engineering problems such as successfully launching a space probe, building a safe nuclear reactor, or building a secure computer system.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In these fields, a single flaw often leads to total catastrophic failure.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

However, one post criticizes the uses of these analogies and argues that modern AI and neural networks are a new and unique field that has no historical precedent similar to how quantum mechanics is difficult to explain using intuitions from everyday physics.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The author illustrates several ways that ML systems defy intuitions derived from engineering fields like rocketry or computer science.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Model robustness

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In a rocket, swapping a fuel tank for a stabilization fin leads to instant failure.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In a transformer model, however, one can often swap the positions of nearby layers with little to no performance degradation.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Model editability.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

We can manipulate AI models using task vectors that add or subtract weights to give or remove specific capabilities.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Attempting to add or subtract a component from a cryptographic protocol or a physical engine without breaking the entire system is often impossible.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The benefits of scale in ML models In security and rocketry, increasing complexity typically introduces more points of failure.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In contrast, ML models often get more robust as they get bigger.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In summary, the post argues that analogies to hard engineering fields may cause us to overestimate the difficulty of the AI alignment problem even when the empirical reality suggests solutions might be surprisingly tractable.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Three counterarguments to the book's three core arguments.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In the previous section, I identified three reasons why the authors believe that AI alignment is extremely difficult.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

1.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Human values are very specific, fragile, and a tiny space of all possible goals.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

2.

LessWrong (Curated & Popular)
"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Current methods used to train goals into AIs are imprecise and unreliable.