Stephen McAleese

👤 Speaker

449 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

One of the book's arguments for why ASI alignment would be difficult is that ASI alignment is a high-stakes engineering challenge similar to other difficult historical engineering problems such as successfully launching a space probe, building a safe nuclear reactor, or building a secure computer system.

2601.548 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In these fields, a single flaw often leads to total catastrophic failure.

2617.883 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

However, one post criticizes the uses of these analogies and argues that modern AI and neural networks are a new and unique field that has no historical precedent similar to how quantum mechanics is difficult to explain using intuitions from everyday physics.

2622.97 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The author illustrates several ways that ML systems defy intuitions derived from engineering fields like rocketry or computer science.

2637.39 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Model robustness

2646.183 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In a rocket, swapping a fuel tank for a stabilization fin leads to instant failure.

2648.09 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In a transformer model, however, one can often swap the positions of nearby layers with little to no performance degradation.

2653.758 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Model editability.

2661.71 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

We can manipulate AI models using task vectors that add or subtract weights to give or remove specific capabilities.

2663.672 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Attempting to add or subtract a component from a cryptographic protocol or a physical engine without breaking the entire system is often impossible.

2670.837 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The benefits of scale in ML models In security and rocketry, increasing complexity typically introduces more points of failure.

2679.594 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In contrast, ML models often get more robust as they get bigger.

2688.492 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In summary, the post argues that analogies to hard engineering fields may cause us to overestimate the difficulty of the AI alignment problem even when the empirical reality suggests solutions might be surprisingly tractable.

2693.011 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

2705.754 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Three counterarguments to the book's three core arguments.

2707.176 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In the previous section, I identified three reasons why the authors believe that AI alignment is extremely difficult.

2710.831 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

1.

2717.06 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Human values are very specific, fragile, and a tiny space of all possible goals.

2718.643 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

2.

2723.57 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Current methods used to train goals into AIs are imprecise and unreliable.

2725.252 View full episode →

← Previous Page 20 of 23 Next →

Report any issue