Mark Zuckerberg
๐ค SpeakerAppearances Over Time
Podcast Appearances
And one of the things that we've generally tried to do over the last year
is anchor more of our models in our meta AI product Northstar use cases, because the issue with both kind of open source benchmarks and, you know, any given thing like, like the LM arena stuff is it's just, it's, they're often skewed for a, either a very specific, you know,
set of use cases, which are often not actually what any normal person does in your product.
They are often weighted, kind of the portfolio of things that they're trying to measure is different from what people care about in any given product.
And...
Because of that, we've found that trying to optimize too much for that stuff has often led us astray and actually not led towards the highest quality products and the most usage and best feedback within Meta AI as people use our stuff.
So we're trying to anchor our North Star in...
And basically the product value that people kind of report to us and what they say that they want and what their revealed preferences are and using the experiences that we have.
So sometimes I think sometimes these things don't quite line up.
And I think that a lot of them are quite easily misplaced.
um, gameable, right?
So, I mean, I think on, on the, um, arena, you'll see stuff like, uh, like Sonnet 3.7.
It's like a great model, right?
And it's, it's like not near the top.
Um, and it was relatively easy for our team to tune a version of Llama 4 Maverick, um, that basically was way at the top.
Um, whereas the one that we released, um,
That's the kind of the pure model actually has no tuning for that at all.
So it's further down.
So I think you just need to be careful with some of the benchmarks.
And we're going to index primarily on the products.