Stephen McAleese
๐ค SpeakerAppearances Over Time
Podcast Appearances
Quote
The popular 2006 textbook Pattern Recognition and Machine Learning uses a simple example from polynomial regression.
There are infinitely many polynomials of order equal to or greater than the number of data points which interpolate the training data perfectly, and almost all such polynomials are terrible at extrapolating to unseen points.
End quote.
However, in practice large neural networks trained with SGD reliably generalize.
Counting the number of possible models is irrelevant because it ignores the inductive bias of the optimizer and the loss landscape which favor simpler, generalizing models.
While there are theoretically a vast number of bad, overfitting models, they usually exist in sharp and isolated regions of the landscape.
Good, generalizing models typically reside in flat regions of the lost landscape, where small changes to the parameters don't significantly increase error.
An optimizer like SGD doesn't pick a model at random.
Instead it tends to be pulled into a vast, flat basin of attraction while avoiding the majority of non-generalizing solutions.
Additionally, larger networks generalize better because of the blessing of dimensionality.
High dimensionality increases the relative volume of flat, generalizing minima, biasing optimizers toward them.
This phenomenon contradicts the counting argument which predicts that larger models with more possible bad models would be less likely to generalize.
This argument is based on an ML analogy which I'm not sure is highly relevant to AI alignment.
Still I think it's interesting because it shows intuitive theoretical arguments that seem correct can still be completely wrong.
I think the lesson is that real-world evidence often beats theoretical models, especially for new and counterintuitive phenomena like neural network training.
Subheading Arguments based on the aligned behavior of modern LLMs One of the most intuitive arguments against AI alignment being difficult is the abundant evidence of helpful, polite, and aligned behavior from large language models, LLMs, such as GPT-5.
For example, the authors of the essay AI is easy to control use the moral reasoning capabilities of GPT-4 as evidence that human values are easy to learn and deeply embedded in modern AIs.
The moral judgments of current LLMs already align with common sense to a high degree, and LLMs usually show an appropriate level of uncertainty when presented with morally ambiguous scenarios.
This strongly suggests that, as an AI is being trained, it will achieve a fairly strong understanding of human values well before it acquires dangerous capabilities like self-awareness, the ability to autonomously replicate itself, or the ability to develop new technologies.