Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Zooming out, this report is the very first of its kind, the beginning of a new era, really.
It was led by Ajay Kotro, who I interviewed about exactly the gap this research is designed to fill last year.
In my view, all the companies who participated, Anthropic, Google DeepMind, Meta, OpenAI, they should be commended for helping the world figure out how to assess the risks of internal deployment before those risks are actually serious ones, rather than after something has already gone wrong.
Mida will be back to repeat this exercise with any companies willing to participate later in the year.
Why is this different type of testing necessary now?
I see three distinct reasons.
First, up until now, we've relied on model evals, which measure an AI's personality and capabilities, and which are published when the model is released to the general public.
But if we only evaluate models themselves, we're missing fully a third of the problem here, opportunity.
That's determined not by the AI itself, but by how secure the company's technical setup is.
Second, until this year, we could confidently argue that it didn't matter that much whether an AI might want to run a rogue deployment or if they were given an easy opportunity to do it.
They were just too unreliable as independent agents to pull that kind of thing off well.
But that argument doesn't hold up too well anymore, or at least it relies on companies implementing active countermeasures to at least make it kind of hard for them.
Thirdly, the transparency laws relating to AI models in California, New York, and the EU and so on, these only cover AI models at the point they start being sold to the general public.
But with Mythos, we're entering a new world where AI companies may have much more powerful models that they're using a ton internally, but which they aren't selling to anyone else.
Anthropic has opted to tell us quite a lot about mythos, but they didn't have to.
And obviously, AI models can cause problems, even if they're only deployed within an AI company.
AI companies, they're actually kind of vast piles of compute.
So they're a super attractive target for a rogue misaligned AI.
I'm reminded of the famous question asked of bank robber Willie Sutton.
Why do you rob banks, he was asked.