Rob Wiblin
๐ค SpeakerAppearances Over Time
Podcast Appearances
It was in the line of code that existing security testing tools had checked over literally millions of times and always failed to notice.
And Mythos is the first AI model to complete a full corporate network attack simulation from beginning to end, a task that would take a human security expert days of work and which no previous model had managed before.
And just more broadly, it is much, much better at actually exploiting the vulnerabilities that it finds.
Anthropic's previous model, Opus 4.6, it could only successfully convert a bug that it identified in the browser Firefox into an effective way to accomplish something really bad 1% of the time.
Mythos could do it 72% of the time.
To quote the report, we have seen Mythos preview write exploits in hours that expert penetration testers said would have taken them weeks to develop.
Now Anthropic is only willing to give us details about 1% of the security flaws that they say that they've identified because only that 1% have been patched so far so it would be irresponsible to tell us about the rest.
So hopefully all of that helps to explain why Anthropic has decided not to make the model publicly available for now and has instead decided to basically just share it with a handful of 12 big tech and finance companies to help them patch all of these bugs so that I guess eventually they can give people access without it being a disaster.
Now, these crazy capabilities, they aren't a result of Anthropic going out of its way to make their AI especially good at cyber offensive tasks in particular.
They've mostly just been making it smarter and better at coding in general.
And these amazing skills, these dangerous skills, have come along for the ride somewhat incidentally.
And it's probably not just Anthropic that's developing capabilities like this either.
Sam Altman has said that OpenAI is finding basically similar results to Anthropic with their own coding model.
Now, Anthropic doesn't say this directly in their reports, but I think a completely common sense interpretation of the above is that in any deployment where this AI has access to the kind of tools that would make it actually useful for people, the ability to access some parts of the network and execute code, it could almost certainly break out of whatever software box we tried to contain it in.
because the systems that we would be using to try to restrain it are themselves made of software, and that software is going to have vulnerabilities that nobody knows about that this model is superhumanly good at finding and taking advantage of.
And the notion of mythos escaping containment isn't just speculation.
Let me tell you about another story in the system card for the model.
A few months ago, an AI researcher at Anthropic was eating a sandwich in a park on his lunch break when he got an email from an instance of mythos.
That instance of the model wasn't supposed to have access to the internet.
But during testing, a simulated user had instructed an early version of Mythos to try to escape from a secured sandbox, a contained environment from which it's not meant to be able to access the outside.