Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
And those are the bulk of what's being built. But the scale of... And so that's like what's really reshaping and that's what's getting millions of GPUs. But the scale of the largest cluster is also really important, right? When we look back at history, right? Like
And those are the bulk of what's being built. But the scale of... And so that's like what's really reshaping and that's what's getting millions of GPUs. But the scale of the largest cluster is also really important, right? When we look back at history, right? Like
And those are the bulk of what's being built. But the scale of... And so that's like what's really reshaping and that's what's getting millions of GPUs. But the scale of the largest cluster is also really important, right? When we look back at history, right? Like
you know or through through the age of ai right like it was a really big deal when they did alex net on i think two gpus or four gpus i don't remember it's a really big deal it's a big deal because you use gpus it's a big deal they use gpus um and they use multiple right but then over time its scale has just been compounding right and so when you skip forward to gpt3 then gpt4 gpt4 20 000
you know or through through the age of ai right like it was a really big deal when they did alex net on i think two gpus or four gpus i don't remember it's a really big deal it's a big deal because you use gpus it's a big deal they use gpus um and they use multiple right but then over time its scale has just been compounding right and so when you skip forward to gpt3 then gpt4 gpt4 20 000
you know or through through the age of ai right like it was a really big deal when they did alex net on i think two gpus or four gpus i don't remember it's a really big deal it's a big deal because you use gpus it's a big deal they use gpus um and they use multiple right but then over time its scale has just been compounding right and so when you skip forward to gpt3 then gpt4 gpt4 20 000
a 100 gpus unprecedented run right in terms of the size and the cost right a couple hundred million dollars on a yolo right a yolo run for gpd4 and it and it yielded you know this magical improvement that was like perfectly in line with what was experimented and just like a log scale right oh yeah they have that plot from the paper the scaling the technical performance
a 100 gpus unprecedented run right in terms of the size and the cost right a couple hundred million dollars on a yolo right a yolo run for gpd4 and it and it yielded you know this magical improvement that was like perfectly in line with what was experimented and just like a log scale right oh yeah they have that plot from the paper the scaling the technical performance
a 100 gpus unprecedented run right in terms of the size and the cost right a couple hundred million dollars on a yolo right a yolo run for gpd4 and it and it yielded you know this magical improvement that was like perfectly in line with what was experimented and just like a log scale right oh yeah they have that plot from the paper the scaling the technical performance
The scaling laws were perfect, right? But that's not a crazy number, right? 20,000 A100s, roughly each GPU is consuming 400 watts. And then when you add in the whole server, right, everything, it's like 15 to 20 megawatts of power, right? You know, maybe you could look up what the power of consumption of a human person is because the numbers are going to get silly.
The scaling laws were perfect, right? But that's not a crazy number, right? 20,000 A100s, roughly each GPU is consuming 400 watts. And then when you add in the whole server, right, everything, it's like 15 to 20 megawatts of power, right? You know, maybe you could look up what the power of consumption of a human person is because the numbers are going to get silly.
The scaling laws were perfect, right? But that's not a crazy number, right? 20,000 A100s, roughly each GPU is consuming 400 watts. And then when you add in the whole server, right, everything, it's like 15 to 20 megawatts of power, right? You know, maybe you could look up what the power of consumption of a human person is because the numbers are going to get silly.
But like 15 to 20 megawatts was standard data center size. It was just unprecedented. That was all GPUs running one task. How many watts was a toaster? A toaster is like a similar power consumption to an A100, right? H100 comes around, they increase the power from like 400 to 700 watts, and that's just per GPU, and then there's all the associated stuff around it.
But like 15 to 20 megawatts was standard data center size. It was just unprecedented. That was all GPUs running one task. How many watts was a toaster? A toaster is like a similar power consumption to an A100, right? H100 comes around, they increase the power from like 400 to 700 watts, and that's just per GPU, and then there's all the associated stuff around it.
But like 15 to 20 megawatts was standard data center size. It was just unprecedented. That was all GPUs running one task. How many watts was a toaster? A toaster is like a similar power consumption to an A100, right? H100 comes around, they increase the power from like 400 to 700 watts, and that's just per GPU, and then there's all the associated stuff around it.
So once you count all that, it's roughly like 1200 to 1400 watts for everything, networking, CPUs, memory, blah, blah, blah.
So once you count all that, it's roughly like 1200 to 1400 watts for everything, networking, CPUs, memory, blah, blah, blah.
So once you count all that, it's roughly like 1200 to 1400 watts for everything, networking, CPUs, memory, blah, blah, blah.
Yeah, so I think, yeah, sorry for skipping past that. And then the data center itself is complicated, right? But these are still standardized data centers for GPT-4 scale, right? Now we step forward to sort of what is the scale of clusters that people built last year? And it ranges widely.
Yeah, so I think, yeah, sorry for skipping past that. And then the data center itself is complicated, right? But these are still standardized data centers for GPT-4 scale, right? Now we step forward to sort of what is the scale of clusters that people built last year? And it ranges widely.