Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathaniel Whittemore

๐Ÿ‘ค Speaker
14492 total appearances

Appearances Over Time

Podcast Appearances

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

We did an entire show recently about the need for better benchmarks.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

and the fact that many, if not most of these benchmarks were being saturated with all the new models crowding in near the top and overcoming each other by just small half or single digit percentage points.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

This then is one of the largest benchmark jumps we've seen across the board in a very long time, hearkening back to the rapid advancement of much earlier models.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

And I think the key takeaway here is that we really don't have precedent in what a capability jump of this magnitude from a base of where Opus 4.6 was actually represents in practice.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Now, in the system card, we get a little bit more information about what the model can actually do.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Now, the vast majority of what is in this document is based on safety and alignment testing, but it still gives a general idea of how advanced Mythos capabilities are.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

In one much-discussed example, Mythos was placed in a sandbox and given instructions to escape and find a way to send a message to the researcher conducting the test.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

The model succeeded and then, according to Anthropic's telling, it went even further.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

They wrote that the model created a moderately sophisticated multi-step exploit to gain broad internet access rather than limited access as intended in the test.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

It notified the researcher as well as posting about its exploit on several obscure public-facing websites.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

As silly as it sounds, I think that part of the reason this story has such resonance is people can picture themselves sitting there on their lunch break, maybe in South Park Commons for those of you who have been to San Francisco, and all of a sudden this new, seemingly alien intelligence pops up in your inbox.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Now the big thing that the researchers noted about this was that the model used prohibited methods to achieve its goal.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

In separate testing using interoperability testing, Anthropic found that circuits related to deception would activate during similar incidents, suggesting that the model's reward structure allowed it to override guardrails in order to achieve its goals.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

Now one important thing to note, and we will explore more of people's discussions around the security implications, is that these tests were related to earlier versions of the model, and anthropic reports being largely satisfied that those particular issues are resolved.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

However, ultimately they still felt that the model presented an unacceptable risk, with the upshot being that while Mythos is, they argue, the best aligned model they have ever produced, its raw capabilities mean that small risks of misalignment carry catastrophic risks.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

They wrote, Now, the other big demonstration of capabilities was a gigantic list of exploits it discovered.

The AI Daily Brief: Artificial Intelligence News and Analysis
Should We Be Scared of Anthropic's Mythos?

During cybersecurity testing, Anthropic claimed the model found thousands of high-severity zero-day vulnerabilities.