Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Nathan Lambert

๐Ÿ‘ค Speaker
1665 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I can explain that. So today, if you use a model, like you look at an API, OpenAI charges a certain price per million tokens, right? And that price for input and output tokens is different, right? And the reason is that when you're inputting a query into the model, right? Let's say you have a book, right? That book, you must now calculate the entire KV cache for it, right? This key value cache.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

I can explain that. So today, if you use a model, like you look at an API, OpenAI charges a certain price per million tokens, right? And that price for input and output tokens is different, right? And the reason is that when you're inputting a query into the model, right? Let's say you have a book, right? That book, you must now calculate the entire KV cache for it, right? This key value cache.