Nathaniel Whittemore
๐ค SpeakerAppearances Over Time
Podcast Appearances
of how AI and agents can not only enhance productivity, but actually fundamentally change outcomes in measurable ways in your business this year, go to besuper.ai.
Now, it would take a couple of years and some significant advances in the models to actually unleash vibe coding in the way that it happened over the course of 2025.
But the idea was there very early.
We've similarly had interest in vast teams of agents that can coordinate amongst themselves to accomplish more things, even if the capability set hasn't fully been there.
Which isn't to say that people haven't been experimenting.
Lindy released their agent swarm tool back in April of 2025.
And the concept is related to something that I've talked about on this show, the Doctor Strange theory of AI agent work.
Now, the specific point that I've made is actually about the difference in how enterprises think agents will play out
versus how I think they will play out, with the difference being that I don't think that agents are going to be one-to-one replacements for existing human work.
I think that we're going to be able to deploy lots and lots of agents to scenario and war game different types of work, which while not exactly the same as agent swarms, which are more about breaking down complex tasks into specific subtasks, is in some ways still part of the same larger conversation about how agents will actually work in the future.
Over the last couple of days, we have started to get the first big model releases of 2026, and maybe the most significant so far is Moonshot's Kimi K2.5.
While it is the agent swarm feature of K2.5 which has the most chatter, it's worth checking out the broader model as a whole.
Artificial analysis sums up the shift when they write, Moonshot's Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier, with only OpenAI, Anthropic, and Google models ahead.
And indeed, the benchmarks are impressive.
K2.5, for example, claims 50.2 on Humanity's last exam, which would put them ahead of GPT-5-2 running on high settings, Opus 4.5, and Gemini 3.
On a variety of other benchmarks as well, they claim performance that matches or exceeds these premier Western models.
On the overall independent artificial analysis index, Kimi jumps from 11th place overall with their K2 thinking model into 5th only behind two iterations of GPT-5.2, Opus 4.5, and Gemini 3 Pro.
And of course, the cost is cheaper than any of those models.
In AA's tests, Kimi K2.5 was about four times cheaper than Opus 4.5 or GPT-5.2, but was still much more expensive than, for example, DeepSeek version 3.2.
One of the things that Moonshot has emphasized in their launch is the model's native multimodality.