Andy
๐ค SpeakerAppearances Over Time
Podcast Appearances
Because I guess Grok 4.1 is also a very competent model and it has some peculiar advantages as well, I'm sure.
But we're going to talk about the big three here.
Okay, so when Gemini 3 came out and then the Gemini 3 Pro model was released, those new models crushed the competition.
And, you know, including in the Math Arena Apex mathematics test, Gemini 3 Pro scored 23.4%, which was...
compared to GPT-5.1's 1%.
That's how big an advantage Google Gemini 3 Pro set that way.
And then in addition to that, Gemini 3 Pro, again, a slightly larger model than just the standard Gemini 3 that comes out,
Gemini 3 Pro, and I'm not sure how to parse those because there's also Gemini 3 Deep Think, which is the winner in the Arc AGI 2 thing.
I'm going to talk about that in a second.
But Gemini 3 Pro set a new high score for AI models in the tracking AI's offline IQ test, meaning it cannot use search or anything else.
It has to use its internal reasoning capabilities.
and it surpassed grok for expert modes 126 achieving 130 uh point iq on a reasoning test basically so that's that's google gemini 3. so then anthropics coming back strong with claude opus 4.5
It reclaimed the top spot on key coding benchmarks like SWE Bench Verified, software engineering.
81%, edging out Gemini 3 Pro's 76%.
So significantly better than Gemini 3 Pro.
And at the same time,
The important takeaway for Claude Opus 4.5 is they cut their prices.
Opus was their top model before, and it was expensive in the 4.1.
The new Opus 4.5 is a third the cost of what Opus 4.1 was before.