Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

๐Ÿ‘ค Speaker
8038 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Just in time.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Let's go back in time, right?

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Arc AGI 1 was published in 2019 before GPT-3 came out.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So ArcAGI-1 didn't last very long.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

The models progressed so swiftly in 2022, 2023, that it was surpassing the ability of that benchmark to really reasonably differentiate models.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

But it's still used out there.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

I mean, you can see the ArcAGI-1 leaderboard here.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Then they came out with Arc AGI 2, which is tougher, more difficult visual puzzles and reasoning things that make it not possible.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

I shouldn't say not possible, but less likely that the training can be tuned to beat that benchmark.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

It requires, you know, real reasoning processing.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Now, RKGI-3 is an interactive reasoning benchmark designed to measure an AI agent's ability to generalize in novel, unseen environments.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And Opus 4.6 demonstrates better reasoning and use of memory than Gemini 3.1 Pro on those and solves more levels.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And it says, current models may be able to solve Arc AGI-3 given access to a harness with a simple memory.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Memory scaffolds, it says, I'm pulling this from a summary from, I think, the neuron.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

It says, memory scaffolds are likely enough for pseudo-continual learning.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

and will push us to some self-improvement behavior by models.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

So if you provide a memory to one of these models, it could be able to solve RKGI-3.

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

And I think at that point, you have to say, okay, with the addition of a memory scaffold, on top of reasoning skills,

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

Who's going to claim we have no longer achieved AGI?

The Daily AI Show
Gemini 3.1 Pro Preview Jumps Ahead

I think we're there, right?