Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
Today, individual largest is Elon, right?
Elon's cluster in Memphis, 200,000 GPUs, right? Meta has like 128,000. OpenAI has 100,000. Now, to be clear, other companies have more GPUs than Elon. They just don't have them in one place, right? And for training, you want them tightly connected. There's some techniques that people are researching and working on that lets you train across multiple regions.
Elon's cluster in Memphis, 200,000 GPUs, right? Meta has like 128,000. OpenAI has 100,000. Now, to be clear, other companies have more GPUs than Elon. They just don't have them in one place, right? And for training, you want them tightly connected. There's some techniques that people are researching and working on that lets you train across multiple regions.
Elon's cluster in Memphis, 200,000 GPUs, right? Meta has like 128,000. OpenAI has 100,000. Now, to be clear, other companies have more GPUs than Elon. They just don't have them in one place, right? And for training, you want them tightly connected. There's some techniques that people are researching and working on that lets you train across multiple regions.
But for the most part, you want them all in like one area, right? So you can connect them highly with high-speed networking, right? Um, and so, you know, Elon today has 200,000 GP H one hundreds and H a hundred thousand H one hundreds, a hundred thousand H two hundreds, right. Um, meta open AI, uh, you know, and, and, and Amazon all have on the scale of a hundred thousand, a little bit less.
But for the most part, you want them all in like one area, right? So you can connect them highly with high-speed networking, right? Um, and so, you know, Elon today has 200,000 GP H one hundreds and H a hundred thousand H one hundreds, a hundred thousand H two hundreds, right. Um, meta open AI, uh, you know, and, and, and Amazon all have on the scale of a hundred thousand, a little bit less.
But for the most part, you want them all in like one area, right? So you can connect them highly with high-speed networking, right? Um, and so, you know, Elon today has 200,000 GP H one hundreds and H a hundred thousand H one hundreds, a hundred thousand H two hundreds, right. Um, meta open AI, uh, you know, and, and, and Amazon all have on the scale of a hundred thousand, a little bit less.
Um, but next this year, right this year, people are building much more, right. Anthropic and Amazon are building a cluster of 400,000 tranium too, which is Amazon specific chip, uh, trying to get away from Nvidia. Right. Um, you know, uh, yeah. Meta and OpenAI have scales for hundreds of thousands. But by next year, you'll have like 500,000 to 700,000 GPU clusters.
Um, but next this year, right this year, people are building much more, right. Anthropic and Amazon are building a cluster of 400,000 tranium too, which is Amazon specific chip, uh, trying to get away from Nvidia. Right. Um, you know, uh, yeah. Meta and OpenAI have scales for hundreds of thousands. But by next year, you'll have like 500,000 to 700,000 GPU clusters.
Um, but next this year, right this year, people are building much more, right. Anthropic and Amazon are building a cluster of 400,000 tranium too, which is Amazon specific chip, uh, trying to get away from Nvidia. Right. Um, you know, uh, yeah. Meta and OpenAI have scales for hundreds of thousands. But by next year, you'll have like 500,000 to 700,000 GPU clusters.
And note those GPUs are much higher power consumption than existing ones, right? Hopper 700 watts, Blackwell goes to 1200 watts, right? So the power per chip is growing and the number of chips is growing, right?
And note those GPUs are much higher power consumption than existing ones, right? Hopper 700 watts, Blackwell goes to 1200 watts, right? So the power per chip is growing and the number of chips is growing, right?
And note those GPUs are much higher power consumption than existing ones, right? Hopper 700 watts, Blackwell goes to 1200 watts, right? So the power per chip is growing and the number of chips is growing, right?
I mean, I don't doubt Elon, right? The filings that he has for like, you know, the power plant and the Tesla battery packs, it's clear he has some crazy plans for Memphis, like permits and stuff is open record, right? But it's not quite clear that, you know, what and what the timescales are. I just never doubt Elon, right? You know, that's he's gonna surprise us.
I mean, I don't doubt Elon, right? The filings that he has for like, you know, the power plant and the Tesla battery packs, it's clear he has some crazy plans for Memphis, like permits and stuff is open record, right? But it's not quite clear that, you know, what and what the timescales are. I just never doubt Elon, right? You know, that's he's gonna surprise us.
I mean, I don't doubt Elon, right? The filings that he has for like, you know, the power plant and the Tesla battery packs, it's clear he has some crazy plans for Memphis, like permits and stuff is open record, right? But it's not quite clear that, you know, what and what the timescales are. I just never doubt Elon, right? You know, that's he's gonna surprise us.
So these mega clusters make no sense for inference, right? You could route inference there and just not train. Yeah. But most of the inference capacity is being, you know, hey, I've got a 30 megawatt data center here. I've got 50 megawatts here. I've got 100 here, whatever. I'll just throw inference in all of those because the mega clusters, right? Multi gigawatt data centers.
So these mega clusters make no sense for inference, right? You could route inference there and just not train. Yeah. But most of the inference capacity is being, you know, hey, I've got a 30 megawatt data center here. I've got 50 megawatts here. I've got 100 here, whatever. I'll just throw inference in all of those because the mega clusters, right? Multi gigawatt data centers.
So these mega clusters make no sense for inference, right? You could route inference there and just not train. Yeah. But most of the inference capacity is being, you know, hey, I've got a 30 megawatt data center here. I've got 50 megawatts here. I've got 100 here, whatever. I'll just throw inference in all of those because the mega clusters, right? Multi gigawatt data centers.
I want to train there because that's where all of my GPUs are co-located, where I can put them at a super high networking speed connected together, right? Because that's what you need for training. Now with pre-training, this is the old scale, right? You could, you would increase parameters. You didn't increase data model gets better, right?