Andy Halliday

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Look at this.

373.499 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

This is a pretty logarithmic scale looks like here.

374.38 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Yeah.

377.764 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

But now look at Gemini 3.1 Pro Preview.

378.385 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Not only does it jump above Cloud Opus 4.6 and Cloud Sonnet 4.6 on high, which were kind of really in the sweet spot here, generating really high scores on Arc AGI 2, but at a relatively inexpensive rate.

381.289 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Now Gemini 3.1 Pro Preview

402.815 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

pushes it back to $1 per task.

405.959 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Yeah.

409.609 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

And yet goes upwards of, you know, 75, 78% somewhere in there.

409.729 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

So very impressive performance on Arcade GI2 by Gemini 3.1 Pro, which just came out yesterday.

416.687 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Yeah.

488.9 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

So is it that using a single model, you find that there is a lack of reliability because of inconsistency in the reproducing of results?

489.821 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Or is it that you're talking about inconsistency across models, which is, oh, this model doesn't reliably produce the things that I need

501.254 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

as compared to apparently the way it reliably produces solutions to the difficult tasks on the Arc AGI prizes.

512.828 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

You'd probably pay more attention then to the x-axis on that RKGI display, which is, hey, did it have to repeat that process, burning tokens repeatedly in order to get to the right solution?

581.501 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

Or

595.58 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

as in the new case of Gemini 3.1 Pro, that it only costs $1 worth of tokens to do a task on AGI 2.

597.322 View full episode →

The Daily AI Show

Gemini 3.1 Pro Preview Jumps Ahead

One thing I wanted to point out is that on that prior graphic that I showed from artificial analysis, and artificial analysis is a company, right?