Nathaniel Whittemore
๐ค SpeakerAppearances Over Time
Podcast Appearances
Now, interestingly on that one, with tools enabled, Muse's score only jumped to 50.4, leaving it trailing all three of those major rivals by a few points.
This could suggest the model isn't as good at web search or tool use as the others, but of course this is only a single data point.
The general sense you get from the benchmarks is that Muse is in the mix, but certainly not leading the pack.
And you can certainly tell where Meta is trying to put the emphasis.
Rather than leading with their scores on Humanity's Last Exam or SweetBench, those scores are buried fairly deep in the results table, with Meta instead leading on the multimodal benchmarks where Muse Spark excels.
The model scored 86.4 on Charvix Reasoning, which is a measure of visual comprehension, which would actually have that being a state-of-the-art result, beating Gemini 3.1 Pro by 6 points.
MuseSpark did slightly trail Gemini on assortment of other visual tests, but the results were strong enough to suggest the model will be highly capable.
Now, these benchmarks also gel with how Meta views the model's purpose.
Unlike the other model companies where there is increasing focus on coding use cases and enterprise use cases more broadly, MuseSpark is designed primarily to drive personal agents.
In a Threads post, Mark Zuckerberg wrote that Muse Spark is a world-class assistant and particularly strong in areas related to personal superintelligence like visual understanding, health, social content, shopping, games, and more.
And interestingly, in that same note, while Zuckerberg is trying to draw a clear differentiation between the work-focused use cases the other companies are pursuing...
There is still broadly, even here and even in the personal realm, a shift from assistant AI to agentic AI.
Zuckerberg ends his threads post by saying, we are building products that don't just answer your questions, but act as agents that do things for you.
Giving more examples of where these capabilities will be useful, Metta wrote that they enable interactive experiences like creating fun mini games or troubleshooting your home appliances with dynamic annotations.
The model will immediately go into service driving Meta AI and will presumably arrive across their social media platforms over time.
MuSpark will function in three modes, instant with no reasoning, thinking mode which enables reasoning, and contemplating mode that performs deep research style multi-step reasoning.
Contemplating mode, however, won't be available at launch.
Meta also emphasized the health assistant use case, touting that they collaborated with a thousand physicians to curate training data for factual accuracy.
Now, in this case, there doesn't seem to be a separate interface for health.
It's just functionality that's being encouraged on Meta's existing platforms.