Cal Newport
๐ค SpeakerAppearances Over Time
Podcast Appearances
So it's kind of like a best case scenario.
But this is another indication of we're not getting these fundamental giant leaps in the capabilities of the digital brains.
It's either some benchmarking, like we tuned it to do better on a particular benchmark, or we built better programs around it.
So when you put the money that they put into mythos, and if really the best thing you had to emphasize when it was done is we have a cybersecurity benchmark where Opus 4.6 was at 66.7 and this is 83.1.
That doesn't necessarily going to justify what's going on.
Or that AISI has this... There's only one thing in there where they see a leap from Mythos at a particular contrived security scenario they came up with.
And this big leap that got them all worried was Opus 4.6 could, on average, complete 16 out of 32 steps in this challenge.
And Mythos, on average, could do 22 steps out of 32.
That's hundreds and hundreds of millions of dollars of training, electricity, or whatever.
I think that's an issue.
What are you doing?
I thought you were going to automate the economy.
It's 100% correct.
And let me tell you why I would be worried if I was them.
Here's the worrisome thing about Mythos, right?
Is, again, they talk about these vulnerabilities hidden for decades that, you know, Mythos found or what have you.
And they replicated multiple different independent security teams were able to find most of those vulnerabilities using three to five billion parameter open weight models.
Put that in perspective, right?