Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Nathaniel Whittemore

πŸ‘€ Speaker
24138 total appearances

Appearances Over Time

Podcast Appearances

Absurdly impressive.

1429.455 View full episode β†’

Prime Intellect's Elie Bacausch writes that Thinking 1 uses zero synthetic data or distillation from previous models.

1431.098 View full episode β†’

Quote, This means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start.

1436.945 View full episode β†’

Bold choice that makes it harder and requires more iterations to reach state-of-the-art, but you get full control over your model series and it proves they are serious about being a frontier lab.

1442.572 View full episode β†’

Ethan Malek lamented the fact that no one really has gotten their hands on these things, so we're just left to squint at the benchmarks, which themselves are confusing.

1451.643 View full episode β†’

He writes,

1458.031 View full episode β†’

It's difficult to know how good MAI Thinking 1 is from the scores alone, like weirdly low GPQA in Terminal Bench 2.0.

1458.892 View full episode β†’

But Microsoft makes it really hard to try its models upon release, so I don't know.

1466.041 View full episode β†’

Others, like Leaker I Rule the World, pooh-poohed the releases, saying, In case it's unclear, the Microsoft model isn't competitive, particularly not for anything agentic.

1470.406 View full episode β†’

And indeed, Thinking 1's scores on the agentic coding tests like Terminal Bench 2.0 and SweeBench Pro were meaningfully lower than competitors even one generation ago from Anthropic and OpenAI.

1478.756 View full episode β†’

But go one step deeper and it's quite clear that Microsoft is playing a different game.

1488.288 View full episode β†’

I believe that they have very clearly identified cost optimization as an issue and believe that their approach can be part of the answer.

1492.476 View full episode β†’

In his announcement post, Mustafa Suleiman wrote, All of this is the foundation for Microsoft Frontier Tuning.

1498.969 View full episode β†’

It lets you customize our models to create custom, company-specific agents that only you control.

1504.14 View full episode β†’

early adopters are already seeing a difference.

1509.23 View full episode β†’

When we tuned our models for McKinsey's tasks, MAI delivered the highest win rate, outperforming GPT-5-5 on quality while being 10x lower on cost.

1511.454 View full episode β†’

On stage, Microsoft CEO Satya Nadella called this a pretty significant shift.

1520.668 View full episode β†’

He said, we believe the time has come for every company to just move from consuming a frontier model to fully participating at the frontier in the frontier ecosystem.

1524.815 View full episode β†’

In other words, I don't think that we should be looking at this series of models

1533.128 View full episode β†’

completely in raw terms as something where one of us as listeners is going to decide to fire up MAI Thinking 1 instead of GPT-5.5 or Opus 4.8.

1536.854 View full episode β†’