Andy Halliday
👤 SpeakerAppearances Over Time
Podcast Appearances
And the first bit of news is that there's a company called H Company that has just set a new desktop computer use benchmark.
outperforming GPT 5.4 and Opus 4.6.
And remember, computer use was originated by Claude.
Claude, computer use was the initial announcement of this very important agentic capability, which is, okay, you need the agent that's working with you, that AI, to be able to manipulate your computer.
Well, what's tremendous about this piece of news is that it's only a 10 billion active parameter device.
$35 billion architecture.
And it beats GPT-5.4, which scored 75% on this OS World verified benchmark for computer use.
It beats it by several points.
nine we'll call it 79 78.9 79 call it against 75 for gpt 5.4 so it's better at that and it is activating only 10 billion parameters to do that in a 35 billion thing now 35 billion is actually practical on a local machine
like you have to have a pretty big ram you know you know and you know high-end processor to do it but you can run this computer use thing so you can imagine now that being distilled and brought down smaller and smaller until we have extremely proficient computer using agents that are local to our machines and it's an open weight model it's open source um
That's an incredible improvement there.
Now I want to just jump over on the same subject, agents and agentic systems.
There's a company over in France called Nou Research, N-O-U-S, like us in French.
has the Hermes line of AI models.
And we used to talk about them back in 2023.
They haven't really hit the news much in recent times, but they've just delivered a local agent that's patterned after OpenClaw and challenges OpenClaw, but it's the first serious self-improving version of a claw.
Okay, so most local agents have memory and they execute tasks, but this one introduces a genuinely different kind of architecture, which has a do it, learn from it, and improve loop built into the process.