Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andy Halliday

๐Ÿ‘ค Speaker
3893 total appearances

Appearances Over Time

Podcast Appearances

The Daily AI Show
Anthropic Drops a Monster Model

So notice here, this is also compacted, right?

The Daily AI Show
Anthropic Drops a Monster Model

Zero to 70% accuracy is compressed into this little slip in the bottom.

The Daily AI Show
Anthropic Drops a Monster Model

So if you saw these in their full extent, you'd see that they're all pretty close.

The Daily AI Show
Anthropic Drops a Monster Model

They're all pretty close.

The Daily AI Show
Anthropic Drops a Monster Model

The worst Opus 4.1 is at 75 percent.

The Daily AI Show
Anthropic Drops a Monster Model

And, you know, the next one's at 81 percent.

The Daily AI Show
Anthropic Drops a Monster Model

Which is Opus 4.5.

The Daily AI Show
Anthropic Drops a Monster Model

So look at GPT 5.1 Codex Max.

The Daily AI Show
Anthropic Drops a Monster Model

That was phenomenal.

The Daily AI Show
Anthropic Drops a Monster Model

which just came out recently after gemini 3.0 pro you see it goes 76.2 let's let's round them 76 it goes to 78 with gpt 5.1 codex max and then opus 4.5 jumps to 81. those are big numbers at the margin right but again if you look at the competence of any one of these coding ages they're all damn good

The Daily AI Show
Anthropic Drops a Monster Model

I'm not going to decide to use Codex Max just because it bested Gemini 3 Pro by a little over a point.

The Daily AI Show
Anthropic Drops a Monster Model

There's other factors that blend into that.

The Daily AI Show
Anthropic Drops a Monster Model

And a lot of them have to do with the personality and the style of the coding.

The Daily AI Show
Anthropic Drops a Monster Model

And Nate Jones has done a really nice comparison of Codex Max.

The Daily AI Show
Anthropic Drops a Monster Model

And GPT-3, I think it was, he compared the two and showed that there are actual differences there in the style, the approach that those models take, not just do they solve the coding problems on the benchmark.

The Daily AI Show
Anthropic Drops a Monster Model

I have some numbers on that.