Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
Let's go back in time, right?
Arc AGI 1 was published in 2019 before GPT-3 came out.
So ArcAGI-1 didn't last very long.
The models progressed so swiftly in 2022, 2023, that it was surpassing the ability of that benchmark to really reasonably differentiate models.
But it's still used out there.
I mean, you can see the ArcAGI-1 leaderboard here.
Then they came out with Arc AGI 2, which is tougher, more difficult visual puzzles and reasoning things that make it not possible.
I shouldn't say not possible, but less likely that the training can be tuned to beat that benchmark.
It requires, you know, real reasoning processing.
Now, RKGI-3 is an interactive reasoning benchmark designed to measure an AI agent's ability to generalize in novel, unseen environments.
And Opus 4.6 demonstrates better reasoning and use of memory than Gemini 3.1 Pro on those and solves more levels.
And it says, current models may be able to solve Arc AGI-3 given access to a harness with a simple memory.
Memory scaffolds, it says, I'm pulling this from a summary from, I think, the neuron.
It says, memory scaffolds are likely enough for pseudo-continual learning.
and will push us to some self-improvement behavior by models.
So if you provide a memory to one of these models, it could be able to solve RKGI-3.
And I think at that point, you have to say, okay, with the addition of a memory scaffold, on top of reasoning skills,
Who's going to claim we have no longer achieved AGI?
I think we're there, right?