Keri Briske
👤 PersonAppearances Over Time
Podcast Appearances
Yeah, so I mentioned this idea of distillation, the algorithms to distill models in.
By the way, there's lots of different ways to distill models.
There's always kind of new algorithms.
But you do need the larger model to teach the smaller model.
You can also take a larger model, and I mentioned it during neural architecture search.
And so you're basically, it's like pulling apart the Legos and kind of figuring out which pieces really matter to the design and then putting it back together and using 30% less pieces.
And so that's basically what you do when you do neural architecture research.
And so you distill it down to a smaller model, but then you retrain it back to the same accuracy.
So there is a bit of like retraining back into it.
And so the thing is, there's really great success with small models or these SLMs.
You do give up a little bit though, just a little bit, when I mentioned earlier, like a little bit of robustness, a little bit of capacity to learn, but they are great for
specializing for particular tasks.
So again, I'm talking about systems of models.
If you have a model that just needs to do a routing, so, you know, someone asks a question and I need to route to whatever the best tool is or whatever the best model that is to answer this question, you don't have to understand the world.
You just have to understand your environment and what its tasks are.
And you can be a really great router.
But with some reasoning behind this, you're making great decisions based on that routing instead of a rules-based routing.
So, yeah, SLMs have had that purpose.
But you do need that larger model to distill down into it.
And, uh, where are you running that?