Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.
And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.
And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.
We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.
We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.
We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.
You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.
You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.
You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.
Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.
Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.
Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.
But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?
But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?
But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?
And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.
And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.
And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.
I can explain that. So today, if you use a model, like you look at an API, OpenAI charges a certain price per million tokens, right? And that price for input and output tokens is different, right? And the reason is that when you're inputting a query into the model, right? Let's say you have a book, right? That book, you must now calculate the entire KV cache for it, right? This key value cache.
I can explain that. So today, if you use a model, like you look at an API, OpenAI charges a certain price per million tokens, right? And that price for input and output tokens is different, right? And the reason is that when you're inputting a query into the model, right? Let's say you have a book, right? That book, you must now calculate the entire KV cache for it, right? This key value cache.