Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so these are just a bunch of data centers. And the point here is that Google has a very advanced infrastructure. very tightly connected in a small region. So Elon will always have the biggest cluster fully connected, right? Because it's all in one building, right? And he's completely right on that, right?
Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.
Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.
Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.
I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?
I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?
I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?
YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.
YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.
YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.
But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.
But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.
But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.
And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.
And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.
And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.
So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.
So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.
So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.
But like the experience when you're inside of Google and you're training on TPUs as a researcher, you don't need to know anything about the hardware in many cases. Right. Like it's like pretty beautiful. But as soon as you step outside, they all go. A lot of them go back.