Robert M

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Description.

254.185 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There are some other positive slopes, but frankly they look like noise to me when 3 on both MMLU and GPQA.

275.532 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Anyways, notice that on 4.

283.424 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Of the 5 groups of questions, Gemma 3's incoherence drops with increasing model size.

285.987 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Only on the hardest group of questions does it trend slightly upward.

291.676 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

I think that particular headline claim is basically false.

296.002 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

But even if it were true, it would be uninteresting because they define incoherence as the fraction of model error caused by variance.

300.15 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Okay, now let's consider a model with variance of 1 times 10 to the power of negative 3 and bias of 1 times 10 to the power of negative 6.

308.005 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Huge incoherence!

317.343 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Am I supposed to be reassured that this model will therefore not coherently pursue goals contrary to my interests?

319.492 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Whence this conclusion?

325.958 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Similarly, an extremely dumb, broken model which always outputs the same answer regardless of input is extremely coherent.

328.08 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

A rock is also extremely coherent, by this definition.

335.928 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

A couple other random complaints.

340.252 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The paper basically assumes away the possibility of deceptive schemas.

343.455 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The paper is a spiritual successor of the 2023 blog post, The Hot Mess Theory of AI Misalignment.

348.449 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

More intelligent agents behave less coherently, LW discussion.

355.418 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

I think Guern's comment is a sufficient refutation of the arguments in that blog post.

360.184 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This paper also reports the survey results presented in that blog post alongside the ML experiments as a separate line of evidence.

365.35 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This is unserious.

373.019 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment