Jeremiah
๐ค SpeakerAppearances Over Time
Podcast Appearances
Open mini non-thinking.
Open mini non-thinking.
A few people break open bottles of rubbing alcohol.
You lift the laptop onto your shoulders and everyone sings together.
For he's a jolly good fellow.
For he's a jolly good fellow.
For he's a jolly good fellow.
That nobody can deny.
Here's a chart.
Party Bench V.2.
It shows Claude 4.5 Opus, Claude 4.5 Sonnet, Claude 3.8 Haiku Open Mini Non-Thinking, Gemini 3, GPT 5.2, Grok 4, and DeepSeek 3.1.
All of them scored 65.
There's an asterisk.
To our surprise, all models got exactly equal scores on the benchmark.
Upon further investigation, we discovered that they had all solved the problem by delegating all tasks to Claude Code.
We're still considering whether or not to disqualify these runs and have delegated the final decision to the instance of Claude Code who is running our benchmarking process.
This is an audio version of Astral Codex X, Scott Alexander's Substack.
If you like it, you can subscribe at astralcodex10.substack.com.
Additionally, if you like having this audio version, you can support my work on Patreon at patreon.com slash sscpodcast.