Nathan Lambert

👤 Speaker

1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You know, a hundred something billion dollars of revenue for NVIDIA last year, 200 something billion this year, right? And if next year, you know, it could nearly double again or more than double, right? Based on like what we see with data center footprints, like being built out all across the US and the rest of the world. It's going to be really hard for China to keep up with these rules, right?

12843.925 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You know, a hundred something billion dollars of revenue for NVIDIA last year, 200 something billion this year, right? And if next year, you know, it could nearly double again or more than double, right? Based on like what we see with data center footprints, like being built out all across the US and the rest of the world. It's going to be really hard for China to keep up with these rules, right?

12843.925 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.

12862.499 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.

12862.499 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.

12862.499 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?

12884.507 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?

12884.507 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?

12884.507 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.

12898.29 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.

12898.29 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.

12898.29 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.

12936.499 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.

12936.499 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.

12936.499 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.

12954.927 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.

12954.927 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.

12954.927 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.

13018.715 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.

13018.715 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.

13018.715 View full episode →

← Previous Page 51 of 84 Next →

Report any issue