Nathaniel Whittemore
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
40% fewer tokens for the same score is a moat and it doesn't show up in the pricing table.
Enterprises are learning that cheap model and cheap workflow are unrelated numbers.
Price per token was always a proxy, which means the real metric was always tokens times price times attempts to correct.
And if the new Microsoft models are any indication, this is very quickly going to cease to be a hidden consideration.
VC Tomas Tanguz wrote, Microsoft put a new column on its latest model card, Average Token Usage.
It will become a standard.
For example, he writes, MAI Code 1 Flash hits 71.6 on SweeBench Verified, using a third of the token's Claude Haiku 4.5 burns.
Benchmarks now ship on two axes, performance and the cost to get there.
Even the most valuable companies cannot afford state-of-the-art intelligence everywhere.
Model companies will compete on intelligence per dollar.
The app layer will compete one level up, on dollars per outcome, a closed ticket, a shipped PR, a resolved support case.
Every layer prices the way the customer thinks, per result, not per token.
And so one of the ways that I think you're going to see adaptation is that the labs themselves are going to start to prioritize different things, not just raw intelligence, but token efficiency as well.
Certainly, Microsoft thinks it has an opportunity to compete with their new frontier tuning approach.
In announcing the new models and their frontier tuning program, they gave the example of a collaboration with McKinsey, where when the model was tuned for McKinsey's tasks, the Microsoft model delivered the highest win rate, even outperforming GPT 5.5, while being 10 times lower in cost than GPT 5.5.
And it won't just be the big labs.
You're also going to see the agent labs and even app layer companies experiment with their own models, their own harnesses, and their own routing systems in order to get better token efficiency, which is exactly what I meant when I said that every AI business model is now to some extent a token efficiency play.
We saw this with Cursor's Composer 2.5, which completes coding tasks in the range of the state-of-the-art from both Cloud and OpenAI, but with a radically higher efficiency.
Interestingly, we also just got something from legal AI firm Harvey along the same lines.
This week, Harvey tweeted, We partnered with Fireworks AI to train open-source models for legal.