Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Robert M

👤 Speaker
195 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Description.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There are some other positive slopes, but frankly they look like noise to me when 3 on both MMLU and GPQA.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Anyways, notice that on 4.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Of the 5 groups of questions, Gemma 3's incoherence drops with increasing model size.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Only on the hardest group of questions does it trend slightly upward.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

I think that particular headline claim is basically false.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

But even if it were true, it would be uninteresting because they define incoherence as the fraction of model error caused by variance.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Okay, now let's consider a model with variance of 1 times 10 to the power of negative 3 and bias of 1 times 10 to the power of negative 6.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Huge incoherence!

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Am I supposed to be reassured that this model will therefore not coherently pursue goals contrary to my interests?

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Whence this conclusion?

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Similarly, an extremely dumb, broken model which always outputs the same answer regardless of input is extremely coherent.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

A rock is also extremely coherent, by this definition.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

A couple other random complaints.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The paper basically assumes away the possibility of deceptive schemas.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The paper is a spiritual successor of the 2023 blog post, The Hot Mess Theory of AI Misalignment.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

More intelligent agents behave less coherently, LW discussion.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

I think Guern's comment is a sufficient refutation of the arguments in that blog post.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This paper also reports the survey results presented in that blog post alongside the ML experiments as a separate line of evidence.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This is unserious.