Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

๐Ÿ‘ค Speaker
7827 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

and regular Opus 4.6 at 64.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So that's a pretty big spread back to Gemini 3.1 Pro Preview.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

What does that mean?

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

For most of us, probably not a lot, unless you're actually using the entire Google ecosystem to set up a harness for multi-agent coding or multi-agent workflow management.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

You might want to...

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

not focus entirely on Gemini 3.1 Pro Preview.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Make Opus available to whatever system you're architecting.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Now, a lot of it depends on memory and the ability to do either recursion, where you process things while creating an intermediate capture of what the results of that are, and then opening up the context window again with new inference based on that new context and so on.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So that's this kind of iterative looping process using a Python REPL,

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

storage concept.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Let me share, whenever a new model comes out, there's a couple of different benchmarks I go immediately to.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

One is that artificial analysis index.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And the other one is ArcAsia 2.0.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And this is the thing that's measuring where these players are in the progression towards an effective accomplishment of artificial general intelligence.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And these are really complex reasoning tasks that make it hard for any of the models to really get right.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

significantly above 60%, 70%, which you see where the players were just recently.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Although Gemini 3 DeepThink that came out on February 20th, it's still earlier this month, but it's already at a high expense level.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

This x-axis down here is cost per AGI...

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And you can see that GP 5.2 was spending upwards of, you know, $70 per task.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And Gemini 3 DeepThink was spending, you know, something in the order of, let's say, $20 a task.