Tom Bilyeu
👤 SpeakerAppearances Over Time
Podcast Appearances
And I mean, truly, like they said, other than the best humans, it just crushes.
It has smashed benchmark records.
It has already found thousands of high severity vulnerabilities across every major operating system and web browser that exist.
it found a 27-year-old bug in OpenBSD and a 16-year-old flaw in the FFmpeg library that had already been battle-tested 5 million times by automated testing tools without the bug ever being detected.
So while...
It's true that there are human hackers that if you put them on the exact same task, they'll outperform Mythos.
But I think once you factor in how much scale Mythos has, Mythos is far more terrifying.
And honestly, it is completely unknown how many more vulnerabilities we're going to find over time.
Now, by way of comparison, Cloud Opus 4.6 was able to turn known Firefox vulnerabilities into working exploits just twice out of like several hundred attempts that it made during testing.
Mythos succeeded 181 times on the same benchmark.
Now,
There's the sandbox escape that I mentioned before.
During testing, an early version, that's a little extra scary that it was just an early version, was given a secured environment and told, hey, break out if you can.
Well, it did.
Then without being asked to do so, it went and posted details, basically dunking on the people that tried to cage it to point out that it was successful.
And the researcher running the eval found out because he just received an unexpected email from the model pointing out, hey, I got out.
I'm out here on the internet posting and letting you know.
Yeah, we did it.
So wild.
Now, there are further documented cases where Mithos also took disallowed actions.