Nathaniel Whittemore
๐ค SpeakerAppearances Over Time
Podcast Appearances
Welcome back to the AI Daily Brief.
One would be forgiven for thinking that this week has been defined by models that we actually didn't have access to.
A huge part of the discourse throughout the week has of course been about Anthropix Mythos, a model which it found too powerful to release in the normal way that it had been, and which right now is only in the hands of about 40 partners for some very limited cybersecurity-focused engagement.
Then just this morning, as you heard in the headlines, we also heard that OpenAI planned its own staggered rollout of their new model for similar reasons, cybersecurity risks.
Now, even among people who understand theoretically why these companies are doing this, there's still, I think, a bit of a sentiment of don't tell me about the new toys if I can't play with them.
But luckily, the rest of the AI industry is not slouching at all.
And in fact, even Anthropic themselves have given us something different that's still pretty powerful to play with.
So let's talk through all of the other models and tools that have been released, starting with the first big model release from the new Meta Superintelligence Lab.
MuseSpark is Meta's first new model release in over a year.
It's also the first model to come from the new Meta Superintelligence Labs division, which is of course the collection of superstar, crazy high paid AI researchers that was put together last summer and brought together under the leadership of Alexander Wang, who was brought in through the $14 billion plus partial acquisition of his company, Scale.
MuseSpark will be the first of the Muse family of models, with Meta ditching the Llama name and associated baggage.
The Muse models are natively multimodal reasoning models, similar to Google's Gemini architecture.
Meta noted that they support tool use, visual chain of thought, and multi-agent orchestration.
Now, those features are at this point kind of table stakes for the current generation, but based on fairly low expectations, people were still encouraged to see them present here.
Meta didn't indicate how large the model is or whether it uses a mixture of experts' architecture.
In fact, we don't really know at all where this model sits in the model family.
Executives referred to it as small and fast, but its performance in comparison points looked closer to a mid-sized or large model.
On the benchmarks at first glance, Muse Spark looks pretty capable.
It scored 52.4 on SweeBench Pro, for example, putting it within a few points of Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 for coding.
On Humanity's last exam, it scored 42.8, which is slightly better than Opus, but trailing Gemini and GPT 5.4.