Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I don't know how they're doing the networking, but they're using Nvidia spectrum X ethernet. Um, there's actually like, I think, yeah, the unsung heroes are the cooling and electrical systems, which are just like glossed over. Yeah. Um, But I think like one story that maybe is like exemplifies how insane this stuff is, is when you're training, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I don't know how they're doing the networking, but they're using Nvidia spectrum X ethernet. Um, there's actually like, I think, yeah, the unsung heroes are the cooling and electrical systems, which are just like glossed over. Yeah. Um, But I think like one story that maybe is like exemplifies how insane this stuff is, is when you're training, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You're always doing, you're running through the model a bunch, right? In the most simplistic terms, running through the model a bunch, and then you're going to exchange everything and synchronize the weights, right? So you'll do a step. This is like a step in model training, right? And every step your loss goes down, hopefully, and it doesn't always.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You're always doing, you're running through the model a bunch, right? In the most simplistic terms, running through the model a bunch, and then you're going to exchange everything and synchronize the weights, right? So you'll do a step. This is like a step in model training, right? And every step your loss goes down, hopefully, and it doesn't always.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You're always doing, you're running through the model a bunch, right? In the most simplistic terms, running through the model a bunch, and then you're going to exchange everything and synchronize the weights, right? So you'll do a step. This is like a step in model training, right? And every step your loss goes down, hopefully, and it doesn't always.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But in the simplest terms, you'll be computing a lot and then you'll exchange. right? The interesting thing is GPU power is most of it. Networking power is some, but it's a lot less. So while you're computing, your power for your GPUs is here.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But in the simplest terms, you'll be computing a lot and then you'll exchange. right? The interesting thing is GPU power is most of it. Networking power is some, but it's a lot less. So while you're computing, your power for your GPUs is here.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But in the simplest terms, you'll be computing a lot and then you'll exchange. right? The interesting thing is GPU power is most of it. Networking power is some, but it's a lot less. So while you're computing, your power for your GPUs is here.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then when you're exchanging weights, if you're not able to overlap communications and compute perfectly, there may be a time period where your GPUs are just idle and you're exchanging weights and you're like, hey, the model's updating. So you're exchanging the gradients, you do the model update, and then you start training again. So the power goes And it's super spiky.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then when you're exchanging weights, if you're not able to overlap communications and compute perfectly, there may be a time period where your GPUs are just idle and you're exchanging weights and you're like, hey, the model's updating. So you're exchanging the gradients, you do the model update, and then you start training again. So the power goes And it's super spiky.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then when you're exchanging weights, if you're not able to overlap communications and compute perfectly, there may be a time period where your GPUs are just idle and you're exchanging weights and you're like, hey, the model's updating. So you're exchanging the gradients, you do the model update, and then you start training again. So the power goes And it's super spiky.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so funnily enough, when you talk about the scale of data center power, you can blow stuff up so easily. And so Meta actually has accidentally upstreamed something to code in PyTorch where they added an operator. And I kid you not, whoever made this, I want to hug the guy because it says PyTorch. It's like PyTorch.powerplantnoblowup. Okay. equals zero or equal one.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so funnily enough, when you talk about the scale of data center power, you can blow stuff up so easily. And so Meta actually has accidentally upstreamed something to code in PyTorch where they added an operator. And I kid you not, whoever made this, I want to hug the guy because it says PyTorch. It's like PyTorch.powerplantnoblowup. Okay. equals zero or equal one.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so funnily enough, when you talk about the scale of data center power, you can blow stuff up so easily. And so Meta actually has accidentally upstreamed something to code in PyTorch where they added an operator. And I kid you not, whoever made this, I want to hug the guy because it says PyTorch. It's like PyTorch.powerplantnoblowup. Okay. equals zero or equal one.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what it does, what it does is amazing, right? Either, you know, when you're exchanging the weights, the GPU will just compute fake numbers so the power doesn't spike too much. And so then the power plants don't blow up because the transient spikes like screw stuff up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what it does, what it does is amazing, right? Either, you know, when you're exchanging the weights, the GPU will just compute fake numbers so the power doesn't spike too much. And so then the power plants don't blow up because the transient spikes like screw stuff up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what it does, what it does is amazing, right? Either, you know, when you're exchanging the weights, the GPU will just compute fake numbers so the power doesn't spike too much. And so then the power plants don't blow up because the transient spikes like screw stuff up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And Elon's solution was like, let me throw a bunch of Tesla mega packs and a few other things, right? Like everyone has different solutions, but like Meta's at least was publicly and openly known, which is just like set this operator. And what this operator does is it just makes the GPUs compute nothing so that the power doesn't spike.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And Elon's solution was like, let me throw a bunch of Tesla mega packs and a few other things, right? Like everyone has different solutions, but like Meta's at least was publicly and openly known, which is just like set this operator. And what this operator does is it just makes the GPUs compute nothing so that the power doesn't spike.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And Elon's solution was like, let me throw a bunch of Tesla mega packs and a few other things, right? Like everyone has different solutions, but like Meta's at least was publicly and openly known, which is just like set this operator. And what this operator does is it just makes the GPUs compute nothing so that the power doesn't spike.