Nathaniel Whittemore
π€ SpeakerAppearances Over Time
Podcast Appearances
Absurdly impressive.
Prime Intellect's Elie Bacausch writes that Thinking 1 uses zero synthetic data or distillation from previous models.
Quote, This means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start.
Bold choice that makes it harder and requires more iterations to reach state-of-the-art, but you get full control over your model series and it proves they are serious about being a frontier lab.
Ethan Malek lamented the fact that no one really has gotten their hands on these things, so we're just left to squint at the benchmarks, which themselves are confusing.
He writes,
It's difficult to know how good MAI Thinking 1 is from the scores alone, like weirdly low GPQA in Terminal Bench 2.0.
But Microsoft makes it really hard to try its models upon release, so I don't know.
Others, like Leaker I Rule the World, pooh-poohed the releases, saying, In case it's unclear, the Microsoft model isn't competitive, particularly not for anything agentic.
And indeed, Thinking 1's scores on the agentic coding tests like Terminal Bench 2.0 and SweeBench Pro were meaningfully lower than competitors even one generation ago from Anthropic and OpenAI.
But go one step deeper and it's quite clear that Microsoft is playing a different game.
I believe that they have very clearly identified cost optimization as an issue and believe that their approach can be part of the answer.
In his announcement post, Mustafa Suleiman wrote, All of this is the foundation for Microsoft Frontier Tuning.
It lets you customize our models to create custom, company-specific agents that only you control.
early adopters are already seeing a difference.
When we tuned our models for McKinsey's tasks, MAI delivered the highest win rate, outperforming GPT-5-5 on quality while being 10x lower on cost.
On stage, Microsoft CEO Satya Nadella called this a pretty significant shift.
He said, we believe the time has come for every company to just move from consuming a frontier model to fully participating at the frontier in the frontier ecosystem.
In other words, I don't think that we should be looking at this series of models
completely in raw terms as something where one of us as listeners is going to decide to fire up MAI Thinking 1 instead of GPT-5.5 or Opus 4.8.