Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathaniel Whittemore

๐Ÿ‘ค Speaker
14492 total appearances

Appearances Over Time

Podcast Appearances

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Now others are engaging deeply with safety considerations around the model.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Harlan Stewart, who works at the Machine Intelligence Research Institute at Berkeley writes, Anthropic is trying to prevent its powerful new AI from being used in dangerous ways, but the most dangerous use by a wide margin is the one Anthropic itself has planned.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

The planned use, and why they made it to begin with, is to accelerate the creation of superhumanly powerful AI agents by automating the R&D process.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

This plan is not a secret.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

They believe they are on track to create a, quote, country of geniuses in a data center that would be powerful enough to, quote, militarily dominate the world if it chose to do so.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

They believe they are on track to accomplish this in the next 12 months.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Let's hope they're wrong about those predictions, but let's do more than hope.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

We urgently need governments to intervene and stop these companies.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

As I mentioned before, a lot of the chatter is around the sandbox breakout and the emailing of the researcher while they were eating in the park.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

And one less theoretical safety concern that came up was a note fairly deep in the security card where Anthropic admitted to having accidentally trained against the chain of thought for Opus 4-6, Sonnet 4-6, and Mythos for 8% of reinforcement learning.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

So let me be clear, at a minimum, any and all reassuring evidence from their chain of thoughts is worthless.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

They are hopelessly corrupted.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

When Dan McAdory writes, Zvi, can you explain why this is bad?

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Matt Hurd summed up, If the chain of thought is trained against, selective pressure will teach it to hide unwanted behavior from appearing in its chain of thought, making its chain of thought less faithful to its internal process, making chain of thought observation less effective at detecting unwanted behavior.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Zvi actually wrote about this extensively back in March of 2025 in a blog post called The Most Forbidden Technique.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

He writes, The most forbidden technique is training an AI using interpretability techniques.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

An AI produces a final output X via some method M. You can analyze M using technique T to learn what the AI is up to.