Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

๐Ÿ‘ค Speaker
7827 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

score for agentic skills you saw that Claude Opus 4.6 max and Claude Opus regular were at 64 and 68 while you know Gemini 3.1 pro is only at 59 now what but between 59 and the 64 for Claude Opus 4.6 and 68 if you go to max from anthropic

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

There's a model there that I don't know that I highlighted, which is GLM5.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

It's a Z.AI open source model, and it scores 63, the same as Sonnet 4.6.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So SOMET 4.6 and GLM-5 exceed GPT 5.2 extra high.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Yeah.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Right?

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

But what's interesting here is this is agentic reasoning.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And I think that Codex is a model that is more focused on agentic reasoning, but focused on coding as well.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And I just want to point out that

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

GPT-5.2 codex high only gets 57 on that agenda index.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So this is inconsistent with the analysis and commentary that I've been seeing about how superb and superior codex is.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

But in terms of agentic coding,

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

That may be true just in the coding area.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

But when it comes to overall agentic reasoning, it's far behind the leader, which is Claude Opus 4.6 Max.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

It's 11 points behind that on the artificial analysis index.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So this is all conspired to give me little reason to leave Claude.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

I mean, for all the buzz about Codex, I just can't find my way away from Claude Code and Claude Cowork because they really package things up into a desktop application for me.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

That basically solves everything that I want it to do, with the exception of I also turn to GenSpark for things that are not so much in my sort of professional pursuits, but rather, oh, here's a complex problem that I'm trying to solve in various ways.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Go at it.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And GenSpark is really satisfying that way with an enormous palette of different features and services that are available to you as a GenSpark.ai user.