Nathaniel Whittemore
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Specifically, while Claude Opus 48 is now slightly above GPT-55 in terms of its intelligence index score, Claude achieves that score while using about 80 or 90% more tokens, meaning it's significantly less token efficient and actually placing both Opus 47 and 48 outside of the most attractive quadrant.
The release of Gemini 3.5 Flash also saw a lot of this discourse around it as well.
While the overall intelligence was much higher on Gemini 3.5 Flash than 3 Flash, the cost to run the tests was more than five times as much as 3 Flash, moving 3.5 from just at the edge of the most attractive quadrant to firmly outside of it.
All of this is finding its way into the popular discourse as well.
For example, YouTuber and AI entrepreneur Theo recently tweeted, Meanwhile, perception of token efficiency is also part of why Codex has become so much more popular among developers.
Biniyam wrote,
Codex has gotten noticeably better at token efficiency lately.
Same tasks that used to eat up a ton of tokens now feel way more reasonable.
Fundamental analysis on X wrote, GPT-55 and Opus 48 sit around one point apart on the intelligence index, 60.2 versus 61.4.
Their token pricing is almost a match, $5 input on both, $30 versus $25 output.
So why is there a 40% gap in the cost of running the full index?
And the answer, of course, as we just saw, is that the Opus models burned way more tokens to complete the index.
Fundy writes, That's the whole game now.
Per-token pricing is the rate and tokens to completion is the actual invoice.
A model can win on price per token and lose badly on price per task, because the reasoning trace, the restatement, the overthinking is the multiplier nobody printed on the spec sheet.
This is why the cheapest per-token model is routinely the most expensive per outcome.
Researchers have a name for it called the overthinking task.
Smaller, cheaper models that ramble can cost more in total than a pricier model that's terse and converges fast.
The buyer side implication is the part the market hasn't priced in yet.
A, the flagship layer now competes on token efficiency, not just capability.