Robert M

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

There's a details box here with the title clawed opus for 0.5 on the measure zeroed argument.

488.958 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The box contents are omitted from this narration.

494.793 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The broader, and weaker, argument that we shouldn't expect AI to act as coherent optimizers without considerable effort might be trivially true.

498.063 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Unfortunately Anthropic, and OpenAI, and Google DeepMind, etc., are putting forth considerable effort to build systems that can reliably solve extremely difficult problems over long-time horizons at coherent optimizers.

507.3 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Then there's the implications for AI safety section.

521.063 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Quote

536.447 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Our results are evidence that future AI failures may look more like industrial accidents than coherent pursuit of goals that were not trained for.

538.141 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Think.

546.573 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The AI intends to run the nuclear power plant, but gets distracted reading French poetry, and there is a meltdown.

547.675 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

However, coherent pursuit of poorly chosen goals that we trained for remains a problem.

554.745 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Specifically Luan.

560.373 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Variants dominate on complex tasks.

562.456 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

When frontier models fail on difficult problems requiring extended reasoning, there is a tendency for failures to be predominantly incoherent rather than systematic.2.

565.58 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Scale doesn't imply supercoherence.

575.631 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.3.

578.774 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

This shifts alignment priorities.

584.721 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

If capable AI is more likely to be a hot mess than a coherent optimizer of the wrong goal, this increases the relative importance of research targeting reward hacking and goal misspecification during training.

588.665 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

The bias term, rather than focusing primarily on aligning and constraining a perfect optimizer.4.

599.939 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Unpredictability is still dangerous.

606.707 View full episode →

LessWrong (Curated & Popular)

"Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)" by RobertM

Incoherent AI isn't safe AI.

609.811 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment