Dylan Patel

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Memory, right?

7751.8 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Memory, right?

7751.8 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Memory, right?

7751.8 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We need to go through a lot of specific technical things of transformers to make this easy for people.

7771.576 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We need to go through a lot of specific technical things of transformers to make this easy for people.

7771.576 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

We need to go through a lot of specific technical things of transformers to make this easy for people.

7771.576 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so the attention operator has three core things. It's queries, keys, and values. QKV is the thing that goes into this. You'll look at the equation. You see that these matrices are multiplied together.

7827.478 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so the attention operator has three core things. It's queries, keys, and values. QKV is the thing that goes into this. You'll look at the equation. You see that these matrices are multiplied together.

7827.478 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so the attention operator has three core things. It's queries, keys, and values. QKV is the thing that goes into this. You'll look at the equation. You see that these matrices are multiplied together.

7827.478 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

These words, query, key, and value come from information retrieval backgrounds where the query is the thing you're trying to get the values for and you access the keys and the values is reweighting. My background's not in information retrieval and things like this. It's just fun to have backlinks. And

7840.969 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

These words, query, key, and value come from information retrieval backgrounds where the query is the thing you're trying to get the values for and you access the keys and the values is reweighting. My background's not in information retrieval and things like this. It's just fun to have backlinks. And

7840.969 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

These words, query, key, and value come from information retrieval backgrounds where the query is the thing you're trying to get the values for and you access the keys and the values is reweighting. My background's not in information retrieval and things like this. It's just fun to have backlinks. And

7840.969 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

What effectively happens is that when you're doing these matrix multiplications, you're having matrices that are of the size of the context length. So the number of tokens that you put into the model and the KV cache is effectively some form of compressed representation of all the previous tokens in the model. So when you're doing this, we talk about autoregressive models.

7856.801 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

What effectively happens is that when you're doing these matrix multiplications, you're having matrices that are of the size of the context length. So the number of tokens that you put into the model and the KV cache is effectively some form of compressed representation of all the previous tokens in the model. So when you're doing this, we talk about autoregressive models.

7856.801 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

What effectively happens is that when you're doing these matrix multiplications, you're having matrices that are of the size of the context length. So the number of tokens that you put into the model and the KV cache is effectively some form of compressed representation of all the previous tokens in the model. So when you're doing this, we talk about autoregressive models.

7856.801 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You predict one token at a time. You start with whatever your prompt was. You ask a question like who was the president in 1825? the model then is going to generate its first token. For each of these tokens, you're doing the same attention operator where you're multiplying these query key value matrices.

7877.025 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You predict one token at a time. You start with whatever your prompt was. You ask a question like who was the president in 1825? the model then is going to generate its first token. For each of these tokens, you're doing the same attention operator where you're multiplying these query key value matrices.

7877.025 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

You predict one token at a time. You start with whatever your prompt was. You ask a question like who was the president in 1825? the model then is going to generate its first token. For each of these tokens, you're doing the same attention operator where you're multiplying these query key value matrices.

7877.025 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But the math is very nice so that when you're doing this repeatedly, this KV cache, this key value matrix, operation, you can keep appending the new values to it. So you keep track of what your previous values you're inferring over in this autoregressive chain. You keep it in memory the whole time. And this is a really crucial thing to manage when serving inference at scale.

7894.55 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But the math is very nice so that when you're doing this repeatedly, this KV cache, this key value matrix, operation, you can keep appending the new values to it. So you keep track of what your previous values you're inferring over in this autoregressive chain. You keep it in memory the whole time. And this is a really crucial thing to manage when serving inference at scale.

7894.55 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment