Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
That's how good it is.
It's a superhuman coding agent, but it has lots of capabilities beyond that.
But it's the first model ever to exceed 80% on the SWE bench, software engineering benchmark.
And here's another interesting point.
Just a few days ago, we talked about the release of OpenAI's Codex Max, Codex 5.1 Max, which was released almost in the aftermath of 3.0.
Gemini 3.0 came out, best coding agent in the world.
Codex Max comes out, best coding agent in the world.
And now Codex Max used a special capability to overcome limitations in the context window and continuity of its effort over a long period of time while working on multiple steps in a complex coding project that
Google had already covered by the success of their very large context windows, right?
They had 1 million and 2 million token context windows.
But what Codex Max demonstrated was a thing they called compaction, which is basically packaging up the context from a first part of a conversation and moving it into the next one in a compressed form, compacted context.
Cloud Opus 4.5 has automatic context compression, enabling unlimited conversation length.
OK, so every one of these players now has really long context windows.
Yeah, really successful things.
And let me just I know you have a table there, but I thought this this image that gives you a comparison.
Can you put that up on the screen there?