Andy
👤 PersonAppearances Over Time
Podcast Appearances
And that because of this compaction technology that OpenAI introduced.
And then yesterday, we mentioned that another new model, I can't keep them straight in my head, also used, or Gemini actually demonstrated as part of their
new technology they they have a context condensation or they use some other term that is about context uh you know summarization and passage to the next player so let me share this chart here and i i want to just show some things that kind of make sense of all of that a little bit more than than all the possible uh interpretations and comparisons that we can make here
So I think someone else has to put that up on the screen for us.
So here we have the AI model performance benchmark comparison across various metrics.
And you see Gemini 3 in deep think mode is here at the top left with this batch of winning scores, largely in the area of intelligence and reasoning.
Humanity's last exam, significantly above
Gemini 3 Pro in the DeepThink mode.
And the next player is way down here at 30.7%, which was GPT-5 Pro.
Look at Arc AGI 2, which is a pure kind of reasoning test for intelligence.
As recently as just a few months ago, all of the models struggled to get 3% on Arc AGI 2.
And now Gemini 3 DeepThink does 45%, Opus 4.5 at 37.6, and the other players, including GPT, way down in the teens.
So Gemini, I think the takeaway here is that Gemini is the smartest model.
Now look at Opus 4.5, pretty darn smart still, you know, pushing the number close up to the numbers on these top reasoning tests.
But in terms, excuse me, in terms of coding, this is where it shines.
So Opus 4.5, you know, we were all excited about the SONNET 4.5.
Well, Opus 4.5 is really the new, you know, top of the charts offering from Anthropic and it's
focused on coding to a large degree.
And then for multimodal work, it's not necessary to apply Gemini 3's deep thinking, but you can see that the top scores are achieved for all of these multimodal tests by Gemini 3 Pro.
And that kind of breaks down the players here.