Andy Halliday

Gemini 3.1 Pro Preview Jumps Ahead

score for agentic skills you saw that Claude Opus 4.6 max and Claude Opus regular were at 64 and 68 while you know Gemini 3.1 pro is only at 59 now what but between 59 and the 64 for Claude Opus 4.6 and 68 if you go to max from anthropic

640.201 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

There's a model there that I don't know that I highlighted, which is GLM5.

667.657 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

It's a Z.AI open source model, and it scores 63, the same as Sonnet 4.6.

673.353 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

So SOMET 4.6 and GLM-5 exceed GPT 5.2 extra high.

683.08 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Yeah.

692.211 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Right?

693.412 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

But what's interesting here is this is agentic reasoning.

694.634 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

And I think that Codex is a model that is more focused on agentic reasoning, but focused on coding as well.

698.038 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

And I just want to point out that

706.908 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

GPT-5.2 codex high only gets 57 on that agenda index.

711.329 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

So this is inconsistent with the analysis and commentary that I've been seeing about how superb and superior codex is.

718.358 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

But in terms of agentic coding,

729.252 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

That may be true just in the coding area.

732.541 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

But when it comes to overall agentic reasoning, it's far behind the leader, which is Claude Opus 4.6 Max.

737.046 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

It's 11 points behind that on the artificial analysis index.

744.415 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

So this is all conspired to give me little reason to leave Claude.

749.862 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

I mean, for all the buzz about Codex, I just can't find my way away from Claude Code and Claude Cowork because they really package things up into a desktop application for me.

755.474 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

That basically solves everything that I want it to do, with the exception of I also turn to GenSpark for things that are not so much in my sort of professional pursuits, but rather, oh, here's a complex problem that I'm trying to solve in various ways.

770.611 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Go at it.

789.168 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

And GenSpark is really satisfying that way with an enormous palette of different features and services that are available to you as a GenSpark.ai user.

790.229 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment