Grace Hsiao
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
And they have exclusivity windows.
And then the Chinese labs kind of weigh that out so they can pay like maybe a million dollars versus like 10 million for the same data set.
Yeah, I think the compute constraint and the capital constraint is real.
And frankly, like, no one's hiding that or pretending that that's not an issue for them right now.
Like, Deep Seekers openly said they even were struggling, right?
Like, they needed more compute.
I think on the distillation allegations or accusation, it is quite interesting.
Like, recently...
I've been thinking about this a lot and thinking about what it means for distillation and what it means for the models to catch up, right?
So there was this one quote from Yao Shunyu, who is a Google DeepMind researcher.
He said, there is smart distillation and dumb distillation.
Dumb distillation is something I think most of us were frankly non-technical think about.
It's like, okay, you take like a thousand queries, you take the answers of whatever Claude gives you, right?
And then you kind of force copy that into your said model.
And then you forcefully make them basically like get the exact same answer.
Smart distillation is like you using the frontier model almost as a partner to help you with the judgment for the evaluation and even the data labeling itself.
So you're using it as almost a teacher for your own model.
It guides it a little bit versus really copy pasting the answer for that makes sense.
And that part of it is frankly not that unethical or like, you know, that frown upon right now, because that is what enterprises do when they're fine tuning.