Robert M

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There are three significant exceptions.

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The first is the synthetic optimizer setting, where they trained models to literally mimic the trajectory of a hand-coded optimizer descending a loss function.

138.504 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

They say, quote,

147.357 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

All models show consistently rising incoherence per step.

150.64 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Interestingly, smaller models reach a lower plateau after a tipping point where they can no longer follow the correct trajectory and stagnate, reducing variance.

154.626 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This pattern also appears in individual bias and variance curves, figure 26.

163.799 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Importantly, larger models reduce bias more than variance.

169.508 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

These results suggest that they learn the correct objective faster than the ability to maintain long coherent action sequences.

173.827 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

End quote.

181.137 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

But bias stemming from a lack of ability is not the same as bias stemming from a lack of propensity.

182.379 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The smaller models here are clearly not misaligned in the propensity sense, which is the conceptual link the paper tries to establish in the description of figure 1 to motivate its definition of incoherence.

188.067 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Quote.

199.242 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

AI can fail because it is misaligned and produces consistent but undesired outcomes, or because it is incoherent and does not produce consistent outcomes at all.

200.708 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

These failures correspond to bias and variance respectively.

210.824 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

As we extrapolate risks from AI, it is important to understand whether failures from more capable models performing more complex tasks will be bias or variance dominated.

214.91 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Bias-dominated failures will look like model misalignment, while variance-dominated failures will resemble industrial accidents.

224.705 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

End quote.

232.652 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

So I think this result provides approximately no evidence that can be used to extrapolate to superintelligent AIs where misalignment might pose actual risks.

234.013 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The next two are Gemma 3, 1B, 4B, 12B, 27B on MMLU and GPQA, respectively.

242.841 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There's an image here.

252.389 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment