Nathaniel Whittemore
๐ค SpeakerAppearances Over Time
Podcast Appearances
Now, there is a lot that we could say about this.
The cynic in me, of course, sees all of the potential challenges with this program, most of which sort of amount to a question of whether this is too little to move the needle.
But we got to start somewhere.
Governments need to get involved in a way that is actually helpful to people adapting to a new world rather than just trying to pretend that they have control over whether that new world exists.
And so for that reasons, I think this is a good thing, and I'm excited to see it hopefully go even farther than they're thinking right now.
Now, our main episode today is about a new model out of China and its agent swarm capabilities.
But Alibaba's Quen team also released a new model earlier this week, specifically called Quen 3 Max Thinking.
Now, as you can probably tell from the naming convention, this is the big flagship model from the Quen team.
They're equivalent of GPT-5.2 Pro, Gemini 3 Pro, or Opus 4.5.
The model makes use of an inference technique that the Quen team are calling heavy mode.
Quen is doing things slightly differently from existing approaches to test time scaling, generating a response, then feeding it back into the model for improvements in a recursive loop.
It appears to be generating some pretty significant gains.
Quen said that this method improved benchmark scores on GPQA, which is a PhD-level science test, from 90.3% to 92.8%.
On live codebench, scores jumped from 88% to 91.4%.
Overall, the benchmarking looks pretty strong.
Now, the cost is a little beefy for a Chinese open-source model.
Quen 3 Max Thinking comes in at around the same cost as Cloud Haiku 4.5, meaning that it's still much cheaper than models like Gemini 3 Pro or GPT 5.2, but about 10 times more expensive than DeepSeq v3.2.
Now, Quen 3 is already being used by many American companies.
Airbnb CEO Brian Chesky, for example, recently said that his company was relying on Quen 3 as a more affordable alternative to U.S.
models, meaning that you got to think that they will be watching this model release closely.