Jeremiah
π€ SpeakerAppearances Over Time
Podcast Appearances
You lift the laptop onto your shoulders and everyone sings together.
For he's a jolly good fellow.
For he's a jolly good fellow.
For he's a jolly good fellow.
That nobody can deny.
Here's a chart.
Party Bench V.2.
It shows Claude 4.5 Opus, Claude 4.5 Sonnet, Claude 3.8 Haiku Open Mini Non-Thinking, Gemini 3, GPT 5.2, Grok 4, and DeepSeek 3.1.
All of them scored 65.
There's an asterisk.
To our surprise, all models got exactly equal scores on the benchmark.
Upon further investigation, we discovered that they had all solved the problem by delegating all tasks to Claude Code.
We're still considering whether or not to disqualify these runs and have delegated the final decision to the instance of Claude Code who is running our benchmarking process.
This is an audio version of Astral Codex X, Scott Alexander's Substack.
If you like it, you can subscribe at astralcodex10.substack.com.
Additionally, if you like having this audio version, you can support my work on Patreon at patreon.com slash sscpodcast.
To reference this, please link to the original post.
To contact me, please use astralcodexpodcast at protormail.com.
Thanks for listening, and I'll speak to you next time.
Welcome to the Astral Codex X podcast for the 6th of January, 2026.