Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
So deep thing.
I'm going to share my screen here quickly just to show you what it did.
But on the far left over here, you see the Arc AGI scale and Arc AGI 2 is the one that we're looking at here.
Arc AGI is a really complex set of.
logic puzzles, basically, that Arc AGI 1, you know, was ultimately replaced by Arc AGI 2 just less than a year ago.
And at the time that ArcAGI 2 came out, which is testing for advanced reasoning and knowledge, and as part of the sort of the metrics set that is being used to attempt to determine whether AI is getting to artificial general intelligence or not, when it first came out, not so long ago, none of the models could surpass anything like 5% on this thing.
Now, Gemini 3 DeepThink has 85% on the Arc AGI 2 test.
Humanity's last exam, it's outperforming Clawed Opus 4.6 Thinking Max by 8.4 percentage points.
And, you know, it's against GPT 5.2.
All these things will be updated, obviously, once GPT 5.3 gets to, you know, get in the game here a little bit more.
But this is a major advance, Gemini 3 DeepThink.
And, you know, OpenAI is struggling to maintain its lead overall.
while being inherently more expensive and underfunded compared to major players like Gemini.
And that's behind, you know, that's Google behind them, right?
Google has new money to support this.
And they're basically surpassing what OpenAI is doing here, in my view.
It has value to me.
And let me just add something about Google DeepThink.
So on top of Google DeepThink's reasoning capabilities, Google DeepThink has basically...