Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
score for agentic skills you saw that Claude Opus 4.6 max and Claude Opus regular were at 64 and 68 while you know Gemini 3.1 pro is only at 59 now what but between 59 and the 64 for Claude Opus 4.6 and 68 if you go to max from anthropic
There's a model there that I don't know that I highlighted, which is GLM5.
It's a Z.AI open source model, and it scores 63, the same as Sonnet 4.6.
So SOMET 4.6 and GLM-5 exceed GPT 5.2 extra high.
But what's interesting here is this is agentic reasoning.
And I think that Codex is a model that is more focused on agentic reasoning, but focused on coding as well.
And I just want to point out that
GPT-5.2 codex high only gets 57 on that agenda index.
So this is inconsistent with the analysis and commentary that I've been seeing about how superb and superior codex is.
But in terms of agentic coding,
That may be true just in the coding area.
But when it comes to overall agentic reasoning, it's far behind the leader, which is Claude Opus 4.6 Max.
It's 11 points behind that on the artificial analysis index.
So this is all conspired to give me little reason to leave Claude.
I mean, for all the buzz about Codex, I just can't find my way away from Claude Code and Claude Cowork because they really package things up into a desktop application for me.
That basically solves everything that I want it to do, with the exception of I also turn to GenSpark for things that are not so much in my sort of professional pursuits, but rather, oh, here's a complex problem that I'm trying to solve in various ways.
And GenSpark is really satisfying that way with an enormous palette of different features and services that are available to you as a GenSpark.ai user.