Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so these are just a bunch of data centers. And the point here is that Google has a very advanced infrastructure. very tightly connected in a small region. So Elon will always have the biggest cluster fully connected, right? Because it's all in one building, right? And he's completely right on that, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But like the experience when you're inside of Google and you're training on TPUs as a researcher, you don't need to know anything about the hardware in many cases. Right. Like it's like pretty beautiful. But as soon as you step outside, they all go. A lot of them go back.