Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Robert M

👤 Speaker
195 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There's a details box here with the title clawed opus for 0.5 on the measure zeroed argument.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The box contents are omitted from this narration.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The broader, and weaker, argument that we shouldn't expect AI to act as coherent optimizers without considerable effort might be trivially true.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Unfortunately Anthropic, and OpenAI, and Google DeepMind, etc., are putting forth considerable effort to build systems that can reliably solve extremely difficult problems over long-time horizons at coherent optimizers.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Then there's the implications for AI safety section.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Quote

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Our results are evidence that future AI failures may look more like industrial accidents than coherent pursuit of goals that were not trained for.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Think.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The AI intends to run the nuclear power plant, but gets distracted reading French poetry, and there is a meltdown.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

However, coherent pursuit of poorly chosen goals that we trained for remains a problem.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Specifically Luan.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Variants dominate on complex tasks.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

When frontier models fail on difficult problems requiring extended reasoning, there is a tendency for failures to be predominantly incoherent rather than systematic.2.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Scale doesn't imply supercoherence.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.3.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This shifts alignment priorities.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

If capable AI is more likely to be a hot mess than a coherent optimizer of the wrong goal, this increases the relative importance of research targeting reward hacking and goal misspecification during training.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The bias term, rather than focusing primarily on aligning and constraining a perfect optimizer.4.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Unpredictability is still dangerous.

LessWrong (Curated & Popular)
"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Incoherent AI isn't safe AI.