Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
You know, a hundred something billion dollars of revenue for NVIDIA last year, 200 something billion this year, right? And if next year, you know, it could nearly double again or more than double, right? Based on like what we see with data center footprints, like being built out all across the US and the rest of the world. It's going to be really hard for China to keep up with these rules, right?
You know, a hundred something billion dollars of revenue for NVIDIA last year, 200 something billion this year, right? And if next year, you know, it could nearly double again or more than double, right? Based on like what we see with data center footprints, like being built out all across the US and the rest of the world. It's going to be really hard for China to keep up with these rules, right?
Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.
Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.
Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.
And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?
And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?
And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?
You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.
You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.
You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.
Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.
Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.
Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.
And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.
And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.
And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.
There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.
There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.
There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.