Rob Wiblin
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
The best internal Anthropic models scored near chance on this data set.
And that's just distinguishing good and bad strategies that someone else came up with.
Presumably that would be worse still at coming up with their own approaches from scratch.
This combination of strengths and weaknesses is repeated elsewhere.
Media staff couldn't be described as especially skeptical about AI capabilities in general, but this report definitely incidentally threw some cold water on the notion that AI agents are going to take all of our office jobs any minute now.
META finds that while they can be extremely strong, where progress on a task is easy to measure accurately, they remain weak, sometimes surprisingly so, when it comes to judgment-heavy, messy, self-directed work.
In one recent case, Opus 4.7 was given four days, internet access, and $5,000, and told to make as much money as possible.
Across four runs, it made precisely $0.
In an experiment run by another organization, Andon Labs, an AI was put in charge of a retail shop in San Francisco, where, among other things, it ordered 1,000 toilet seat covers for the employee bathroom and then listed them as merchandise, while scheduling blunders forced the shop to be shut entirely for three days.
That is why the report's conclusion is a narrow one.
A model might be able to start a small rogue deployment if it inherits enough permissions and finds a monitoring gap.
But keeping that deployment hidden from an investigation, that's the kind of thing that requires careful, coordinated, long-horizon operational security.
Exactly the kind of thing current agents are pretty damn bad at.
The report observes, agents seemed much weaker in domains that made hill climbing difficult or risky, often making critical judgment errors that competent humans would have been unlikely to make.
Hill climbing here means that you can make lots of small attempts, get fast feedback about whether each attempt worked, and keep improving from there.
Current agents are much better when they can try something, get a clean signal about how it's going, and have another crack.
Where hill climbing isn't possible, it's much harder to train them, and they perform far worse in deployment as well.
So this is our key source of protection.
How long can we hope that it will remain the case?
Well, the companies really, really want to make their AIs better at forming high-level strategic plans, improving them, noticing where they're failing, adjusting, and so on.