Dylan Patel

👤 Speaker

See mentions of this person in podcasts

3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But the math is very nice so that when you're doing this repeatedly, this KV cache, this key value matrix, operation, you can keep appending the new values to it. So you keep track of what your previous values you're inferring over in this autoregressive chain. You keep it in memory the whole time. And this is a really crucial thing to manage when serving inference at scale.

7894.55 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There are far bigger experts in this, and there are so many levels of detail that you can go into. Essentially, one of the key quote-unquote drawbacks of the attention operator and the transformer is that there is a form of quadratic memory cost in proportion to the context length.

7918.709 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There are far bigger experts in this, and there are so many levels of detail that you can go into. Essentially, one of the key quote-unquote drawbacks of the attention operator and the transformer is that there is a form of quadratic memory cost in proportion to the context length.

7918.709 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

There are far bigger experts in this, and there are so many levels of detail that you can go into. Essentially, one of the key quote-unquote drawbacks of the attention operator and the transformer is that there is a form of quadratic memory cost in proportion to the context length.

7918.709 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So as you put in longer questions, the memory used in order to make that computation is going up in the form of a quadratic. You'll hear about a lot of other language model architectures that are like sub-quadratic or linear attention forms, which is like state-space models. We don't need to go down all these now.

7935.979 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So as you put in longer questions, the memory used in order to make that computation is going up in the form of a quadratic. You'll hear about a lot of other language model architectures that are like sub-quadratic or linear attention forms, which is like state-space models. We don't need to go down all these now.

7935.979 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So as you put in longer questions, the memory used in order to make that computation is going up in the form of a quadratic. You'll hear about a lot of other language model architectures that are like sub-quadratic or linear attention forms, which is like state-space models. We don't need to go down all these now.

7935.979 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then there's innovations on attention to make this memory usage and the ability to attend over long contexts much more accurate and high performance.

7954.528 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then there's innovations on attention to make this memory usage and the ability to attend over long contexts much more accurate and high performance.

7954.528 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then there's innovations on attention to make this memory usage and the ability to attend over long contexts much more accurate and high performance.

7954.528 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

They help with memory constraint and performance. So if you put in a book into... I think Gemini is the model that has the longest context length that people are using. Gemini is known for 1 million and now 2 million context length. You put a whole book into Gemini and... Sometimes it'll draw facts out of it. It's not perfect. They're getting better. So there's two things.

7968.297 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

They help with memory constraint and performance. So if you put in a book into... I think Gemini is the model that has the longest context length that people are using. Gemini is known for 1 million and now 2 million context length. You put a whole book into Gemini and... Sometimes it'll draw facts out of it. It's not perfect. They're getting better. So there's two things.

7968.297 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

They help with memory constraint and performance. So if you put in a book into... I think Gemini is the model that has the longest context length that people are using. Gemini is known for 1 million and now 2 million context length. You put a whole book into Gemini and... Sometimes it'll draw facts out of it. It's not perfect. They're getting better. So there's two things.

7968.297 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

One, to be able to serve this on the memory level. Google has magic with their TPU stack where they can serve really long contexts. And then there's also many decisions along the way to actually make long context performance work. This implies the data. There's subtle changes to these computations and attention. And it changes the architecture.

7987.697 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

One, to be able to serve this on the memory level. Google has magic with their TPU stack where they can serve really long contexts. And then there's also many decisions along the way to actually make long context performance work. This implies the data. There's subtle changes to these computations and attention. And it changes the architecture.

7987.697 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

One, to be able to serve this on the memory level. Google has magic with their TPU stack where they can serve really long contexts. And then there's also many decisions along the way to actually make long context performance work. This implies the data. There's subtle changes to these computations and attention. And it changes the architecture.

7987.697 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But serving long context is extremely memory-consuming. constrained, especially when you're making a lot of predictions. I actually don't know why input and output tokens are more expensive, but I think essentially output tokens, you have to do more computation because you have to sample from the model.

8005.533 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But serving long context is extremely memory-consuming. constrained, especially when you're making a lot of predictions. I actually don't know why input and output tokens are more expensive, but I think essentially output tokens, you have to do more computation because you have to sample from the model.

8005.533 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But serving long context is extremely memory-consuming. constrained, especially when you're making a lot of predictions. I actually don't know why input and output tokens are more expensive, but I think essentially output tokens, you have to do more computation because you have to sample from the model.

8005.533 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So these are features that APIs are shipping, which is like, prompt caching, pre-filling, because you can drive prices down and you can make APIs much faster. If you know you're going to keep, if you run a business and you're going to keep passing the same initial content to Cloud's API, you can load that in to the Anthropic API and always keep it there.

8114.291 View full episode →

← Previous Page 124 of 178 Next →

Report any issue