Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

We did an entire show recently about the need for better benchmarks.

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

and the fact that many, if not most of these benchmarks were being saturated with all the new models crowding in near the top and overcoming each other by just small half or single digit percentage points.

242.727 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

This then is one of the largest benchmark jumps we've seen across the board in a very long time, hearkening back to the rapid advancement of much earlier models.

252.718 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

And I think the key takeaway here is that we really don't have precedent in what a capability jump of this magnitude from a base of where Opus 4.6 was actually represents in practice.

260.527 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

Now, in the system card, we get a little bit more information about what the model can actually do.

271.6 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

Now, the vast majority of what is in this document is based on safety and alignment testing, but it still gives a general idea of how advanced Mythos capabilities are.

276.99 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

In one much-discussed example, Mythos was placed in a sandbox and given instructions to escape and find a way to send a message to the researcher conducting the test.

286.187 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

The model succeeded and then, according to Anthropic's telling, it went even further.

294.382 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

They wrote that the model created a moderately sophisticated multi-step exploit to gain broad internet access rather than limited access as intended in the test.

298.368 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

It notified the researcher as well as posting about its exploit on several obscure public-facing websites.

306.239 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

Anthropic wrote,

310.966 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.

312.168 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

As silly as it sounds, I think that part of the reason this story has such resonance is people can picture themselves sitting there on their lunch break, maybe in South Park Commons for those of you who have been to San Francisco, and all of a sudden this new, seemingly alien intelligence pops up in your inbox.

317.779 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

Now the big thing that the researchers noted about this was that the model used prohibited methods to achieve its goal.

330.224 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

In separate testing using interoperability testing, Anthropic found that circuits related to deception would activate during similar incidents, suggesting that the model's reward structure allowed it to override guardrails in order to achieve its goals.

335.592 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

Now one important thing to note, and we will explore more of people's discussions around the security implications, is that these tests were related to earlier versions of the model, and anthropic reports being largely satisfied that those particular issues are resolved.

346.93 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

However, ultimately they still felt that the model presented an unacceptable risk, with the upshot being that while Mythos is, they argue, the best aligned model they have ever produced, its raw capabilities mean that small risks of misalignment carry catastrophic risks.

359.043 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

They wrote, Now, the other big demonstration of capabilities was a gigantic list of exploits it discovered.

372.817 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

During cybersecurity testing, Anthropic claimed the model found thousands of high-severity zero-day vulnerabilities.

388.915 View full episode →

The AI Daily Brief: Artificial Intelligence News and Analysis

Should We Be Scared of Anthropic's Mythos?

They write,

394.381 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment