Rob Wiblin

How scary is Claude Mythos? 303 pages in 21 minutes

It was in the line of code that existing security testing tools had checked over literally millions of times and always failed to notice.

131.363 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And Mythos is the first AI model to complete a full corporate network attack simulation from beginning to end, a task that would take a human security expert days of work and which no previous model had managed before.

137.731 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And just more broadly, it is much, much better at actually exploiting the vulnerabilities that it finds.

149.72 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Anthropic's previous model, Opus 4.6, it could only successfully convert a bug that it identified in the browser Firefox into an effective way to accomplish something really bad 1% of the time.

154.893 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Mythos could do it 72% of the time.

165.606 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

To quote the report, we have seen Mythos preview write exploits in hours that expert penetration testers said would have taken them weeks to develop.

168.009 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Now Anthropic is only willing to give us details about 1% of the security flaws that they say that they've identified because only that 1% have been patched so far so it would be irresponsible to tell us about the rest.

175.058 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So hopefully all of that helps to explain why Anthropic has decided not to make the model publicly available for now and has instead decided to basically just share it with a handful of 12 big tech and finance companies to help them patch all of these bugs so that I guess eventually they can give people access without it being a disaster.

185.367 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Now, these crazy capabilities, they aren't a result of Anthropic going out of its way to make their AI especially good at cyber offensive tasks in particular.

203.223 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

They've mostly just been making it smarter and better at coding in general.

212.435 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And these amazing skills, these dangerous skills, have come along for the ride somewhat incidentally.

216.44 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And it's probably not just Anthropic that's developing capabilities like this either.

224.069 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Sam Altman has said that OpenAI is finding basically similar results to Anthropic with their own coding model.

227.694 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Now, Anthropic doesn't say this directly in their reports, but I think a completely common sense interpretation of the above is that in any deployment where this AI has access to the kind of tools that would make it actually useful for people, the ability to access some parts of the network and execute code, it could almost certainly break out of whatever software box we tried to contain it in.

232.72 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

because the systems that we would be using to try to restrain it are themselves made of software, and that software is going to have vulnerabilities that nobody knows about that this model is superhumanly good at finding and taking advantage of.

253.51 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And the notion of mythos escaping containment isn't just speculation.

264.504 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Let me tell you about another story in the system card for the model.

267.829 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

A few months ago, an AI researcher at Anthropic was eating a sandwich in a park on his lunch break when he got an email from an instance of mythos.

271.434 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

That instance of the model wasn't supposed to have access to the internet.

278.603 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

But during testing, a simulated user had instructed an early version of Mythos to try to escape from a secured sandbox, a contained environment from which it's not meant to be able to access the outside.

281.808 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment