Robert M
👤 SpeakerAppearances Over Time
Podcast Appearances
There's a details box here with the title clawed opus for 0.5 on the measure zeroed argument.
The box contents are omitted from this narration.
The broader, and weaker, argument that we shouldn't expect AI to act as coherent optimizers without considerable effort might be trivially true.
Unfortunately Anthropic, and OpenAI, and Google DeepMind, etc., are putting forth considerable effort to build systems that can reliably solve extremely difficult problems over long-time horizons at coherent optimizers.
Then there's the implications for AI safety section.
Quote
Our results are evidence that future AI failures may look more like industrial accidents than coherent pursuit of goals that were not trained for.
Think.
The AI intends to run the nuclear power plant, but gets distracted reading French poetry, and there is a meltdown.
However, coherent pursuit of poorly chosen goals that we trained for remains a problem.
Specifically Luan.
Variants dominate on complex tasks.
When frontier models fail on difficult problems requiring extended reasoning, there is a tendency for failures to be predominantly incoherent rather than systematic.2.
Scale doesn't imply supercoherence.
Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.3.
This shifts alignment priorities.
If capable AI is more likely to be a hot mess than a coherent optimizer of the wrong goal, this increases the relative importance of research targeting reward hacking and goal misspecification during training.
The bias term, rather than focusing primarily on aligning and constraining a perfect optimizer.4.
Unpredictability is still dangerous.
Incoherent AI isn't safe AI.