Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so this is sort of like the quote-unquote visionary behind the company, right? This hedge fund still exists, right? This quantitative firm. And so... DeepSeek is the sort of, you know, slowly he got turned to this full view of like AI, everything about this, right? But at some point it slowly maneuvered and he made DeepSeek. And DeepSeek has done multiple models since then.
And so this is sort of like the quote-unquote visionary behind the company, right? This hedge fund still exists, right? This quantitative firm. And so... DeepSeek is the sort of, you know, slowly he got turned to this full view of like AI, everything about this, right? But at some point it slowly maneuvered and he made DeepSeek. And DeepSeek has done multiple models since then.
They've acquired more and more GPUs. They share infrastructure with the fund. Right. And so, you know, there is no exact number of public GPU resources that they have. But besides this 10,000 GPUs that they bought in 2021. Right. And they were fantastically profitable. Right.
They've acquired more and more GPUs. They share infrastructure with the fund. Right. And so, you know, there is no exact number of public GPU resources that they have. But besides this 10,000 GPUs that they bought in 2021. Right. And they were fantastically profitable. Right.
They've acquired more and more GPUs. They share infrastructure with the fund. Right. And so, you know, there is no exact number of public GPU resources that they have. But besides this 10,000 GPUs that they bought in 2021. Right. And they were fantastically profitable. Right.
And then this paper claims they did only 2,000 H800 GPUs, which are a restricted GPU that was previously allowed in China, but no longer allowed. And there's a new version. But it's basically NVIDIA's H100 for China.
And then this paper claims they did only 2,000 H800 GPUs, which are a restricted GPU that was previously allowed in China, but no longer allowed. And there's a new version. But it's basically NVIDIA's H100 for China.
And then this paper claims they did only 2,000 H800 GPUs, which are a restricted GPU that was previously allowed in China, but no longer allowed. And there's a new version. But it's basically NVIDIA's H100 for China.
right um and there's some restrictions on it specifically around the communications uh sort of uh speed the interconnect speed right which is why they had to do this crazy sm you know scheduling stuff right so going back to that right looks like this is obviously not true in terms of their total gpu count obvious available gpus but for this training run you think 2000 is the correct number or no so this is where it takes um you know a significant amount of sort of like zoning in right like
right um and there's some restrictions on it specifically around the communications uh sort of uh speed the interconnect speed right which is why they had to do this crazy sm you know scheduling stuff right so going back to that right looks like this is obviously not true in terms of their total gpu count obvious available gpus but for this training run you think 2000 is the correct number or no so this is where it takes um you know a significant amount of sort of like zoning in right like
right um and there's some restrictions on it specifically around the communications uh sort of uh speed the interconnect speed right which is why they had to do this crazy sm you know scheduling stuff right so going back to that right looks like this is obviously not true in terms of their total gpu count obvious available gpus but for this training run you think 2000 is the correct number or no so this is where it takes um you know a significant amount of sort of like zoning in right like
What do you call your training run, right? Do you count all of the research and ablations that you ran, right? Picking all this stuff, because yes, you can do a YOLO run, but at some level, you have to do the test at the small scale, and then you have to do some test at medium scale before you go to a large scale.
What do you call your training run, right? Do you count all of the research and ablations that you ran, right? Picking all this stuff, because yes, you can do a YOLO run, but at some level, you have to do the test at the small scale, and then you have to do some test at medium scale before you go to a large scale.
What do you call your training run, right? Do you count all of the research and ablations that you ran, right? Picking all this stuff, because yes, you can do a YOLO run, but at some level, you have to do the test at the small scale, and then you have to do some test at medium scale before you go to a large scale.
Yeah, and research begets the new ideas that let you get huge efficiency.
Yeah, and research begets the new ideas that let you get huge efficiency.
Yeah, and research begets the new ideas that let you get huge efficiency.
So the numbers that DeepSeq specifically said publicly, right, are just the 10,000 GPUs in 2021 and then 2,000 GPUs for only the pre-training for V3. They did not discuss cost on R1. They did not discuss cost on all the other RL, right, for the instructive model that they made, right?
So the numbers that DeepSeq specifically said publicly, right, are just the 10,000 GPUs in 2021 and then 2,000 GPUs for only the pre-training for V3. They did not discuss cost on R1. They did not discuss cost on all the other RL, right, for the instructive model that they made, right?
So the numbers that DeepSeq specifically said publicly, right, are just the 10,000 GPUs in 2021 and then 2,000 GPUs for only the pre-training for V3. They did not discuss cost on R1. They did not discuss cost on all the other RL, right, for the instructive model that they made, right?