Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
You'll recall the impressive cyber capabilities of Mythos Preview that I mentioned a month or two ago and everyone's been talking about.
Its ability to find new vulnerabilities in operating systems, write exploits, progressively break into computer networks and escape the sandboxes Anthropic tried to confine it to.
META doesn't actually say if Mythos Preview was one of the models they tested, but that's the kind of level models are operating at now.
MITA also found that the most capable models they looked at for this research were able to successfully complete almost all of the software engineering tasks that they've come up with, even ones that would take a human professional 10 or 20 hours.
So it's not crazy to guess that an AI that starts out already inside a company's computer networks would have an easy time spreading, gaining access to more and more things, and keeping a low profile.
But MITA argues that would be quite mistaken.
The trouble is this.
to spread from computer to computer and avoid shutdown for weeks.
A rogue agent needs to get a long chain of things right without a single catastrophic mistake.
It needs to understand what's monitoring it, avoid writing incriminating thoughts in its scratch pad, keep credentials alive, create backup instances, and communicate with them without leaving suspicious traces.
And it wouldn't just be dodging one watchdog.
It would be dodging cloud logs, billing records, and network monitoring.
And if anyone noticed something odd, it would then need to survive an investigation where transcripts, shell histories, cloud records, credentials, etc., they all have to tell some consistent story.
And one loose thread could bring the whole thing crashing down.
As long as the company we're talking about cares about this and is on the ball, all of that is absolutely no simple task.
And according to the report, when trying to do this kind of thing, agents made obvious mistakes.
For instance, encoding suspicious commands in ways that made them stand out more
generating justifications that didn't hold together, or verifying their own sabotage in ways that drew monitor attention.
They also found that they chose bad high-level strategies.
Mita writes, our embedded red teamer built a data set of 16 effective and 16 ineffective attack strategies for subverting Anthropic's monitoring system.