Andy Halliday

You'd probably pay more attention then to the x-axis on that RKGI display, which is, hey, did it have to repeat that process, burning tokens repeatedly in order to get to the right solution?

581.501 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Or

595.58 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

as in the new case of Gemini 3.1 Pro, that it only costs $1 worth of tokens to do a task on AGI 2.

597.322 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

One thing I wanted to point out is that on that prior graphic that I showed from artificial analysis, and artificial analysis is a company, right?

622.752 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

They do these tests against all of these AI models.

633.29 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Artificial analysis on that scale

637.256 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

score for agentic skills you saw that Claude Opus 4.6 max and Claude Opus regular were at 64 and 68 while you know Gemini 3.1 pro is only at 59 now what but between 59 and the 64 for Claude Opus 4.6 and 68 if you go to max from anthropic

640.201 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

There's a model there that I don't know that I highlighted, which is GLM5.

667.657 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

It's a Z.AI open source model, and it scores 63, the same as Sonnet 4.6.

673.353 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

So SOMET 4.6 and GLM-5 exceed GPT 5.2 extra high.