Stephen McAleese

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

An inner optimization process like training a model by gradient descent then trains each AI architecture variant produced by the outer search process.

2164.49 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Instead the author believes that human engineers will perform the work of the outer optimizer by manually designing learning algorithms and writing code.

2173.906 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The author gives three arguments why the outer optimizer is more likely to involve human engineering than automated search like evolution.

2182.301 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Most learning algorithms or AI architectures developed so far, for example SGD, transformers, were invented by human engineers rather than an automatic optimization process.

2190.811 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Running learning algorithms and training ML models is often extremely expensive so searching over possible learning algorithms or AI architectures similar to evolution would be prohibitively expensive.

2203.286 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Learning algorithms are often simple, for example SGD, making it tractable for human engineers to design them.

2215.195 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

However, one reason why I personally find the evolution analogy relevant is that I believe the RLHF training process often used today appears to be a belevel optimization process similar to evolution.

2223.367 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

One, like evolution optimizing the genome, the first step of RLHF is to learn a reward function from a dataset of binary preference labels.

2235.225 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

2.

2244.119 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This learned reward function is then used to train the final model.

2244.82 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

This step is analogous to an organism's lifetime learning where behavior is adjusted to maximize a reward function fixed in the outer optimization stage.

2248.928 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Subheading.

2258.647 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Arguments against counting arguments.

2260.09 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

One argument for AI doom that I described above is a counting argument.

2262.766 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Because the space of misaligned goals is astronomically larger than the tiny space of aligned goals, we should expect AI alignment to be highly improbable by default.

2267.071 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

In the post-counting arguments provide no evidence of AI doom the authors challenge this argument using an analogy to machine learning.

2276.442 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

A similar counting argument can be constructed to prove that neural network generalization is very unlikely.

2283.791 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Yet in practice, training neural networks to generalize is common.

2290.222 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

Before the deep learning revolution, many theorists believed that models with millions of parameters would simply memorize data rather than learn patterns.

2294.97 View full episode →

LessWrong (Curated & Popular)

"IABIED Book Review: Core Arguments and Counterarguments" by Stephen McAleese

The authors cite a classic example from regression.

2303.465 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment