Mark Zuckerberg
๐ค SpeakerAppearances Over Time
Podcast Appearances
There's a whole set of different issues, I think around coding, which is the other verifiable domain, which is, you know, I think you kind of need to be worried about
like waking up one day and like, does a model that I have some tie to another government, like can it embed all kinds of different vulnerabilities in code that then like the intelligence organizations associated with that government can then go exploit.
So now you sort of like, all right, like in some future version where you have, you know, some model from some other country that we're using to like,
secure or build out a lot of our systems.
And then all of a sudden you wake up and it's like, everything is just vulnerable to, um, in a way that like that country knows about, but, but like you don't, or, or it turns on a vulnerability at some point.
Those are real issues.
Um, so what we've basically found is
is, no, I mean, I'm very interested in studying this because I think one of the main things that's interesting about open source is the ability to distill models.
You know, most people, the primary value isn't just like taking a model off the shelf and saying like, okay, like Meta built this version of Llama, I'm gonna take it and I'm gonna run it exactly in my application.
It's like, no, well, your application isn't doing anything different if you're just running our thing, you're at least gonna fine tune it or try to distill it into a different model.
And when we get to stuff like the behemoth model, like the whole value in that is being able to basically take this very high amount of intelligence and distill it down into a smaller model that you're actually going to want to run.
But this is like the beauty of distillation.
And it's like one of the things that I think has really emerged as a very powerful technique in the last year since the last time we sat down is I think it's worked better than most people would predict as you can basically take a model that is much bigger and basically
take probably like 90 or 95% of its intelligence and run it in something that's 10% the size.
Now, do you get 100% of the intelligence?
No, but like 95% of the intelligence at 10% of the cost is like pretty good for a lot of things.
The other thing that's interesting is now with this like more varied open source community where it's not just Lama, you have other models, you have the ability to distill from multiple sources.
So now you can basically say, okay, Lama's really good at this.
Like maybe the architecture is really good because it's fundamentally multimodal and fundamentally more inference friendly and more efficient.
But like, let's say this other model is better at coding.