Nathan Lambert

👤 Speaker

1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the model will learn which expert to route to for different tasks. And so this is a humongous innovation in terms of, hey, I can continue to grow the total embedding space of parameters. And so DeepSeq's model is, you know, 600 something billion parameters, right? Relative to LAMA-405b, it's 405 billion parameters, right? Relative to LAMA-70b, it's 70 billion parameters, right?

2217.945 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So this model technically has more embedding space for information, right? To compress all of the world's knowledge that's on the internet down. But at the same time, it is only activating around 37 billion of the parameters. So only 37 billion of these parameters actually need to be computed every single time you're training data or inferencing data out of it.

2238.775 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So this model technically has more embedding space for information, right? To compress all of the world's knowledge that's on the internet down. But at the same time, it is only activating around 37 billion of the parameters. So only 37 billion of these parameters actually need to be computed every single time you're training data or inferencing data out of it.

2238.775 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So this model technically has more embedding space for information, right? To compress all of the world's knowledge that's on the internet down. But at the same time, it is only activating around 37 billion of the parameters. So only 37 billion of these parameters actually need to be computed every single time you're training data or inferencing data out of it.

2238.775 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so versus, again, the LAMA model, 70 billion parameters must be activated or 405 billion parameters must be activated. So you've dramatically reduced your compute cost when you're doing training and inference. with this mixture of experts architecture.

2258.408 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so versus, again, the LAMA model, 70 billion parameters must be activated or 405 billion parameters must be activated. So you've dramatically reduced your compute cost when you're doing training and inference. with this mixture of experts architecture.

2258.408 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so versus, again, the LAMA model, 70 billion parameters must be activated or 405 billion parameters must be activated. So you've dramatically reduced your compute cost when you're doing training and inference. with this mixture of experts architecture.

2258.408 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Effectively, NVIDIA builds this library called Nickel, right? In which, you know, when you're training a model, you have all these communications between every single layer of the model, and you may have over 100 layers. What does Nickel stand for? It's NCCL? NVIDIA Communications Collectives Library. Nice. And so...

2515.327 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Effectively, NVIDIA builds this library called Nickel, right? In which, you know, when you're training a model, you have all these communications between every single layer of the model, and you may have over 100 layers. What does Nickel stand for? It's NCCL? NVIDIA Communications Collectives Library. Nice. And so...

2515.327 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Effectively, NVIDIA builds this library called Nickel, right? In which, you know, when you're training a model, you have all these communications between every single layer of the model, and you may have over 100 layers. What does Nickel stand for? It's NCCL? NVIDIA Communications Collectives Library. Nice. And so...

2515.327 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

When you're training a model, you're going to have all these all-reduces and all-gathers. Between each layer, between the multi-layer perceptron or feed-forward network and the attention mechanism, you'll have basically the model synchronized. Or you'll have all-reducer and all-gather. And this is a communication between all the GPUs in the network, whether it's in training or inference.

2535.846 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

When you're training a model, you're going to have all these all-reduces and all-gathers. Between each layer, between the multi-layer perceptron or feed-forward network and the attention mechanism, you'll have basically the model synchronized. Or you'll have all-reducer and all-gather. And this is a communication between all the GPUs in the network, whether it's in training or inference.

2535.846 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

When you're training a model, you're going to have all these all-reduces and all-gathers. Between each layer, between the multi-layer perceptron or feed-forward network and the attention mechanism, you'll have basically the model synchronized. Or you'll have all-reducer and all-gather. And this is a communication between all the GPUs in the network, whether it's in training or inference.

2535.846 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So NVIDIA has a standard library. This is one of the reasons why it's really difficult to use anyone else's hardware. for training is because no one's really built a standard communications library. And NVIDIA has done this at a sort of a higher level, right?

2557.483 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So NVIDIA has a standard library. This is one of the reasons why it's really difficult to use anyone else's hardware. for training is because no one's really built a standard communications library. And NVIDIA has done this at a sort of a higher level, right?

2557.483 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So NVIDIA has a standard library. This is one of the reasons why it's really difficult to use anyone else's hardware. for training is because no one's really built a standard communications library. And NVIDIA has done this at a sort of a higher level, right?

2557.483 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

DeepSeq, because they have certain limitations around the GPUs that they have access to, the interconnects are limited to some extent by the restrictions of the GPUs that were shipped into China legally, not the ones that are smuggled, but legally shipped in that they use to train this model. They had to figure out how to get efficiencies,

2570.753 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

DeepSeq, because they have certain limitations around the GPUs that they have access to, the interconnects are limited to some extent by the restrictions of the GPUs that were shipped into China legally, not the ones that are smuggled, but legally shipped in that they use to train this model. They had to figure out how to get efficiencies,

2570.753 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

DeepSeq, because they have certain limitations around the GPUs that they have access to, the interconnects are limited to some extent by the restrictions of the GPUs that were shipped into China legally, not the ones that are smuggled, but legally shipped in that they use to train this model. They had to figure out how to get efficiencies,

2570.753 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And one of those things is that instead of just calling the NVIDIA library nickel, they instead scheduled their own communications, which some of the labs do. Emeta talked about in Lama 3 how they made their own custom version of nickel. They didn't talk about the implementation details. This is some of what they did.

2589.606 View full episode →

← Previous Page 2 of 84 Next →

Report any issue