Nathaniel Whittemore
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Today on the AI Daily Brief, this week in AI for ridiculously busy people.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, doing a quick experiment here.
The AI Daily Brief is obviously quite an information-dense podcast.
Despite curating the whole world of AI things happening, it can still be a pretty high barrier to climb for people who are paying attention more casually or just don't have time to dedicate 20 or 25 minutes a day for AI news.
So for those of you who are looking for something that's closer to five minutes to send your colleagues who need to know exactly what was going on in AI that week, that's what this is for.
Let me know how you like it.
First up, let's talk about the biggest theme of the week, which was absolutely by far token efficiency.
I made the argument on Twitter that every AI company is now in some way, shape, or form a token efficiency company.
We have moved officially from the token subsidy era, where the per-seat models of companies like OpenAI and Anthropic were allowing people to consume thousands of dollars worth of AI tokens for tens or hundreds of dollars.
Now we're in the token shortage era, where all the business models are moving to usage-based models and everyone is having to adapt.
This week, that adaptation looked like Uber putting $1,500 monthly limits on employees' AI usage, and Walmart having to cap usage of their tool for it being too high in demand, and comments from companies like TSMC suggesting that this shortage is not a short-term thing but is going to last years.
Importantly though, the market is responding.
AI software engineering agent company Factory introduced native model routing that can figure out what the right model is for a task, including models that are cheaper or not state-of-the-art, which they say can maintain state-of-the-art performance while cutting costs by a quarter.
Perplexity introduced a new system that combines a hybrid local and cloud-based inference system, which has benefits for both cost and privacy.
Harvey announced that it had collaborated with Fireworks AI to build a worker-advisor agent where an open-weight worker can delegate complex tasks to a closed-source frontier advisor powered by one of the state-of-the-art models, and found that it outperformed the state-of-the-art model alone on the legal tasks for just a fraction of the costs.
Microsoft, meanwhile, is clearly trying to bring this sort of capability to the rest of the market, saying that when they collaborated with McKinsey to post-train a model on McKinsey Tasks, it beat GPT-5.5 performance at a tenth of the cost.
TLDR, the token shortage is here, but the market is responding.
In terms of what you should be playing with this weekend, it is absolutely Codex updates.
Codex announced an expansion of their plugin ecosystem, new annotations, and a new feature called Sites.