Nathaniel Whittemore
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And so just in a single week, you have a group of different products all being launched to help solve the problem of token efficiency.
And if you want some evidence that there's demand for this, look no farther than the recently released stats from Ramp, where their number one trending software vendor was China's DeepSeek.
Ramp lead economist Ara Karazian writes, In probably the biggest sign that companies are looking for cheaper alternatives to OpenAI and Anthropic, some are willing to use cheaper Chinese models, sending U.S.
data back and forth from China-hosted servers.
Ara also pointed out that three open-source model service providers made the list this month.
Glean CEO Arvind Jain captured the overall shift in an essay called Your Token Spend is an AI Architecture Problem, Not Just a Model Problem.
He argues that the four architectural levers that determine token efficiency are context quality, i.e.
it being too difficult for either the models to retrieve the right context for the enterprise task at hand, or for them to be confused by too many different buckets of conflicting context, which can just burn tokens before you even get to the actual task at hand.
Arvind also talks about model routing,
where, as he puts it, the goal is not to use smaller models everywhere, but to use the right level of intelligence for the job.
A third vector of token efficiency, he argues, is continual learning, basically building systems that allow experimentation phases to happen once rather than every time.
He writes, when someone does useful work or writes something worth reusing, we document it so we do not have to recreate it from scratch every time.
enterprise AI systems should work the same way.
If it doesn't, the system keeps paying the same exploratory cost again and again.
A system that learns from prior execution can reduce redundant reasoning, skip failed paths, and converge faster on the right workflow.
The result isn't just higher quality, it's lower cost on repeated work.
Lastly, he talks about harness design, which has been another big topic this year.
But to sum up, as I argued yesterday, it's pretty clear at this point that the big theme of the second half of 2026 is going to be how to put all of the exciting things that were uncovered at the beginning of 2026 into practice in a way that's actually cost-efficient and effective.
If you are building something in AI serving the enterprise, my guess is that in some way, shape, or form, that's part of your job even if you haven't identified it as such.
For our part, we will continue to track best practices in how companies are adapting.