Nathan Lambert

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there is one aspect to note, though, right? Is that there is the general ability for that to transfer across different types of runs, right? You may make really, really high quality code for one specific model architecture at one size. And then that is not transferable to, hey, when I make this architecture tweak, everything's broken again, right?

3127.024 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Like that's something that could be, you know, with their specific low level coding of like scheduling SMs is specific to this model architecture and size. Right. And whereas like NVIDIA's collectives library is more like, hey, it'll work for anything. Right. You want to do an all reduce? Great. I don't care what your model architecture is. It'll work.

3147.785 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Like that's something that could be, you know, with their specific low level coding of like scheduling SMs is specific to this model architecture and size. Right. And whereas like NVIDIA's collectives library is more like, hey, it'll work for anything. Right. You want to do an all reduce? Great. I don't care what your model architecture is. It'll work.

3147.785 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Like that's something that could be, you know, with their specific low level coding of like scheduling SMs is specific to this model architecture and size. Right. And whereas like NVIDIA's collectives library is more like, hey, it'll work for anything. Right. You want to do an all reduce? Great. I don't care what your model architecture is. It'll work.

3147.785 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And you're giving up a lot of performance when you do that in many cases. But it's worthwhile for them to do the specific optimization for the specific run, given the constraints that they have regarding compute.

3167.06 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And you're giving up a lot of performance when you do that in many cases. But it's worthwhile for them to do the specific optimization for the specific run, given the constraints that they have regarding compute.

3167.06 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And you're giving up a lot of performance when you do that in many cases. But it's worthwhile for them to do the specific optimization for the specific run, given the constraints that they have regarding compute.

3167.06 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

When people are training, they have all these various dashboards, but like the most simple one is your loss, right? And it continues to go down. But in reality, especially with more complicated stuff like MOE, the biggest problem with it or FP8 training, which is another innovation, you know, going to a lower precision number format, i.e. less accurate, is that you end up with loss spikes.

3207.368 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

When people are training, they have all these various dashboards, but like the most simple one is your loss, right? And it continues to go down. But in reality, especially with more complicated stuff like MOE, the biggest problem with it or FP8 training, which is another innovation, you know, going to a lower precision number format, i.e. less accurate, is that you end up with loss spikes.

3207.368 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

When people are training, they have all these various dashboards, but like the most simple one is your loss, right? And it continues to go down. But in reality, especially with more complicated stuff like MOE, the biggest problem with it or FP8 training, which is another innovation, you know, going to a lower precision number format, i.e. less accurate, is that you end up with loss spikes.

3207.368 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And no one knows why the lost spike happened.

3226.708 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And no one knows why the lost spike happened.

3226.708 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And no one knows why the lost spike happened.

3226.708 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah. These people are like, you know, you'll go out to dinner with like a friend that works at one of these labs and they'll just be like looking at their phone every like 10 minutes. And they're not like, you know, it's one thing if they're texting, but they're just like, like, is the loss. Yeah.

3277.675 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah. These people are like, you know, you'll go out to dinner with like a friend that works at one of these labs and they'll just be like looking at their phone every like 10 minutes. And they're not like, you know, it's one thing if they're texting, but they're just like, like, is the loss. Yeah.

3277.675 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah. These people are like, you know, you'll go out to dinner with like a friend that works at one of these labs and they'll just be like looking at their phone every like 10 minutes. And they're not like, you know, it's one thing if they're texting, but they're just like, like, is the loss. Yeah.

3277.675 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And some level of spikes is normal, right? It'll recover and be back. Sometimes a lot of the old strategy was like, you just stop the run, restart from the old version, and then like change the data mix. And then it keeps going.

3299.85 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And some level of spikes is normal, right? It'll recover and be back. Sometimes a lot of the old strategy was like, you just stop the run, restart from the old version, and then like change the data mix. And then it keeps going.

3299.85 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And some level of spikes is normal, right? It'll recover and be back. Sometimes a lot of the old strategy was like, you just stop the run, restart from the old version, and then like change the data mix. And then it keeps going.

3299.85 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's like there's a distribution. The whole idea of grokking also comes in, right? It's like just because it slowed down from improving and loss doesn't mean it's not learning because all of a sudden it could be like this and it could just spike down and loss again because it learned, truly learned something, right? And it took some time for it to learn that.

3352.442 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment