Nathan Lambert

👤 Speaker

1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so these are just a bunch of data centers. And the point here is that Google has a very advanced infrastructure. very tightly connected in a small region. So Elon will always have the biggest cluster fully connected, right? Because it's all in one building, right? And he's completely right on that, right?

15067.724 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.

15082.958 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.

15082.958 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.

15082.958 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?

15096.324 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?

15096.324 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?

15096.324 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.

15119.512 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.

15119.512 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.

15119.512 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.

15140.557 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.

15140.557 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.

15140.557 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.

15152.193 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.

15152.193 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.

15152.193 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.

15170.247 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.

15170.247 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.

15170.247 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But like the experience when you're inside of Google and you're training on TPUs as a researcher, you don't need to know anything about the hardware in many cases. Right. Like it's like pretty beautiful. But as soon as you step outside, they all go. A lot of them go back.

15188.299 View full episode →

← Previous Page 65 of 84 Next →

Report any issue