Robert M
👤 SpeakerAppearances Over Time
Podcast Appearances
There are three significant exceptions.
The first is the synthetic optimizer setting, where they trained models to literally mimic the trajectory of a hand-coded optimizer descending a loss function.
They say, quote,
All models show consistently rising incoherence per step.
Interestingly, smaller models reach a lower plateau after a tipping point where they can no longer follow the correct trajectory and stagnate, reducing variance.
This pattern also appears in individual bias and variance curves, figure 26.
Importantly, larger models reduce bias more than variance.
These results suggest that they learn the correct objective faster than the ability to maintain long coherent action sequences.
End quote.
But bias stemming from a lack of ability is not the same as bias stemming from a lack of propensity.
The smaller models here are clearly not misaligned in the propensity sense, which is the conceptual link the paper tries to establish in the description of figure 1 to motivate its definition of incoherence.
Quote.
AI can fail because it is misaligned and produces consistent but undesired outcomes, or because it is incoherent and does not produce consistent outcomes at all.
These failures correspond to bias and variance respectively.
As we extrapolate risks from AI, it is important to understand whether failures from more capable models performing more complex tasks will be bias or variance dominated.
Bias-dominated failures will look like model misalignment, while variance-dominated failures will resemble industrial accidents.
End quote.
So I think this result provides approximately no evidence that can be used to extrapolate to superintelligent AIs where misalignment might pose actual risks.
The next two are Gemma 3, 1B, 4B, 12B, 27B on MMLU and GPQA, respectively.
There's an image here.