Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Robert M

👤 Speaker
195 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There are three significant exceptions.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The first is the synthetic optimizer setting, where they trained models to literally mimic the trajectory of a hand-coded optimizer descending a loss function.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

They say, quote,

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

All models show consistently rising incoherence per step.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Interestingly, smaller models reach a lower plateau after a tipping point where they can no longer follow the correct trajectory and stagnate, reducing variance.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This pattern also appears in individual bias and variance curves, figure 26.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Importantly, larger models reduce bias more than variance.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

These results suggest that they learn the correct objective faster than the ability to maintain long coherent action sequences.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

End quote.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

But bias stemming from a lack of ability is not the same as bias stemming from a lack of propensity.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The smaller models here are clearly not misaligned in the propensity sense, which is the conceptual link the paper tries to establish in the description of figure 1 to motivate its definition of incoherence.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Quote.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

AI can fail because it is misaligned and produces consistent but undesired outcomes, or because it is incoherent and does not produce consistent outcomes at all.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

These failures correspond to bias and variance respectively.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

As we extrapolate risks from AI, it is important to understand whether failures from more capable models performing more complex tasks will be bias or variance dominated.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Bias-dominated failures will look like model misalignment, while variance-dominated failures will resemble industrial accidents.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

End quote.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

So I think this result provides approximately no evidence that can be used to extrapolate to superintelligent AIs where misalignment might pose actual risks.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The next two are Gemma 3, 1B, 4B, 12B, 27B on MMLU and GPQA, respectively.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There's an image here.