Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

They only discussed the pre-training for the base model and they did not discuss anything on research and ablations. And they do not talk about any of the resources that are shared in terms of, hey, the fund is using all these GPUs, right? And we know that they're very profitable and that 10,000 GPUs in 2021.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

They only discussed the pre-training for the base model and they did not discuss anything on research and ablations. And they do not talk about any of the resources that are shared in terms of, hey, the fund is using all these GPUs, right? And we know that they're very profitable and that 10,000 GPUs in 2021.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

They only discussed the pre-training for the base model and they did not discuss anything on research and ablations. And they do not talk about any of the resources that are shared in terms of, hey, the fund is using all these GPUs, right? And we know that they're very profitable and that 10,000 GPUs in 2021.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So some of the research that we've found is that we actually believe they have closer to 50,000 GPUs.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So some of the research that we've found is that we actually believe they have closer to 50,000 GPUs.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So some of the research that we've found is that we actually believe they have closer to 50,000 GPUs.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, sorry. We believe they actually have something closer to 50,000 GPUs, right? Now, this is split across many tasks, right? Again, the fund.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, sorry. We believe they actually have something closer to 50,000 GPUs, right? Now, this is split across many tasks, right? Again, the fund.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, sorry. We believe they actually have something closer to 50,000 GPUs, right? Now, this is split across many tasks, right? Again, the fund.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Right. So like Llama 3, they trained on 16,000 H100s, right? But the company of Meta last year publicly disclosed they bought like 400 something thousand GPUs. Yeah. Right. So of course, tiny percentage on the training. Again, like most of it is like serving me the best Instagram reels, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Right. So like Llama 3, they trained on 16,000 H100s, right? But the company of Meta last year publicly disclosed they bought like 400 something thousand GPUs. Yeah. Right. So of course, tiny percentage on the training. Again, like most of it is like serving me the best Instagram reels, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Right. So like Llama 3, they trained on 16,000 H100s, right? But the company of Meta last year publicly disclosed they bought like 400 something thousand GPUs. Yeah. Right. So of course, tiny percentage on the training. Again, like most of it is like serving me the best Instagram reels, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so there's, you know, Ampere was the A100 and then H100 Hopper, right? People use them synonymously in the US because really there's just H100 and now there's H200, right? But same thing, mostly. In China, there've been different salvos of export restrictions. So initially the US government limited on a two-factor scale, right? Which is chip interconnect versus flops, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so there's, you know, Ampere was the A100 and then H100 Hopper, right? People use them synonymously in the US because really there's just H100 and now there's H200, right? But same thing, mostly. In China, there've been different salvos of export restrictions. So initially the US government limited on a two-factor scale, right? Which is chip interconnect versus flops, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so there's, you know, Ampere was the A100 and then H100 Hopper, right? People use them synonymously in the US because really there's just H100 and now there's H200, right? But same thing, mostly. In China, there've been different salvos of export restrictions. So initially the US government limited on a two-factor scale, right? Which is chip interconnect versus flops, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So any chip that had interconnects above a certain level and flops above a certain floating point operations above a certain level was restricted. Later, the government realized that this was a flaw in the restriction and they cut it down to just floating point operations. And so H-800 had high flops, low communication? Exactly. So the H-800 was the same performance as H-100 on flops.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So any chip that had interconnects above a certain level and flops above a certain floating point operations above a certain level was restricted. Later, the government realized that this was a flaw in the restriction and they cut it down to just floating point operations. And so H-800 had high flops, low communication? Exactly. So the H-800 was the same performance as H-100 on flops.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So any chip that had interconnects above a certain level and flops above a certain floating point operations above a certain level was restricted. Later, the government realized that this was a flaw in the restriction and they cut it down to just floating point operations. And so H-800 had high flops, low communication? Exactly. So the H-800 was the same performance as H-100 on flops.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it just had the interconnect bandwidth cut. DeepSeq knew how to utilize this. Hey, even though we're cut back on the interconnect, we can do all this fancy stuff to figure out how to use the GPU fully anyways. And so that was back in October 2022. But later in 2023, end of 2023, implemented in 2024, the US government banned the H800, right? Yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it just had the interconnect bandwidth cut. DeepSeq knew how to utilize this. Hey, even though we're cut back on the interconnect, we can do all this fancy stuff to figure out how to use the GPU fully anyways. And so that was back in October 2022. But later in 2023, end of 2023, implemented in 2024, the US government banned the H800, right? Yeah.