Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Dylan Patel

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
3551 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So these are features that APIs are shipping, which is like, prompt caching, pre-filling, because you can drive prices down and you can make APIs much faster. If you know you're going to keep, if you run a business and you're going to keep passing the same initial content to Cloud's API, you can load that in to the Anthropic API and always keep it there.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So these are features that APIs are shipping, which is like, prompt caching, pre-filling, because you can drive prices down and you can make APIs much faster. If you know you're going to keep, if you run a business and you're going to keep passing the same initial content to Cloud's API, you can load that in to the Anthropic API and always keep it there.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it's very different than we're kind of leading to the reasoning models, which we talked, we showed this example earlier and read some of this kind of mumbling stuff. And what happens is that the output context length is so much higher.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it's very different than we're kind of leading to the reasoning models, which we talked, we showed this example earlier and read some of this kind of mumbling stuff. And what happens is that the output context length is so much higher.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

But it's very different than we're kind of leading to the reasoning models, which we talked, we showed this example earlier and read some of this kind of mumbling stuff. And what happens is that the output context length is so much higher.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And I mean, I learned a lot about this from Dylan's work, which is essentially, as the output length gets higher, you're writing this quadratic in terms of memory used. And then the GPUs that we have, effectively, you're going to run out of memory, and they're all trying to serve multiple requests at once.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And I mean, I learned a lot about this from Dylan's work, which is essentially, as the output length gets higher, you're writing this quadratic in terms of memory used. And then the GPUs that we have, effectively, you're going to run out of memory, and they're all trying to serve multiple requests at once.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And I mean, I learned a lot about this from Dylan's work, which is essentially, as the output length gets higher, you're writing this quadratic in terms of memory used. And then the GPUs that we have, effectively, you're going to run out of memory, and they're all trying to serve multiple requests at once.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So they're doing this batch processing where not all of the prompts are exactly the same, really complex handling. And then as context length gets longer, there's this, I think you call it critical batch size, where your ability to serve So how much you can parallelize your inference plummets because of this long contract.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So they're doing this batch processing where not all of the prompts are exactly the same, really complex handling. And then as context length gets longer, there's this, I think you call it critical batch size, where your ability to serve So how much you can parallelize your inference plummets because of this long contract.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So they're doing this batch processing where not all of the prompts are exactly the same, really complex handling. And then as context length gets longer, there's this, I think you call it critical batch size, where your ability to serve So how much you can parallelize your inference plummets because of this long contract.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So your memory usage is going way up with these reasoning models, and you still have a lot of users. So effectively, the cost to serve multiplies by a ton.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So your memory usage is going way up with these reasoning models, and you still have a lot of users. So effectively, the cost to serve multiplies by a ton.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So your memory usage is going way up with these reasoning models, and you still have a lot of users. So effectively, the cost to serve multiplies by a ton.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

for your kv cache that you end up not being able to run uh a certain number of you know uh you know your sequence length is capped or the number of users let's say the model so this is this is showing for a 405b model and batch size 64 llama 3145b yeah and batch size is crucial to essentially they just like you want to have higher batch size to parallelize parallel your throughput.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

for your kv cache that you end up not being able to run uh a certain number of you know uh you know your sequence length is capped or the number of users let's say the model so this is this is showing for a 405b model and batch size 64 llama 3145b yeah and batch size is crucial to essentially they just like you want to have higher batch size to parallelize parallel your throughput.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

for your kv cache that you end up not being able to run uh a certain number of you know uh you know your sequence length is capped or the number of users let's say the model so this is this is showing for a 405b model and batch size 64 llama 3145b yeah and batch size is crucial to essentially they just like you want to have higher batch size to parallelize parallel your throughput.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Let's go into deep seek again. So we're in the post deep seek R1 time, I think. And there's two sides to this market watching how hard it is to serve it. On one side, we're going to talk about DeepSeek themselves. They now have a chat app that got to number one on the App Store. Disclaimer, number one on the App Store is measured by velocity.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Let's go into deep seek again. So we're in the post deep seek R1 time, I think. And there's two sides to this market watching how hard it is to serve it. On one side, we're going to talk about DeepSeek themselves. They now have a chat app that got to number one on the App Store. Disclaimer, number one on the App Store is measured by velocity.

Lex Fridman Podcast
#459 โ€“ DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Let's go into deep seek again. So we're in the post deep seek R1 time, I think. And there's two sides to this market watching how hard it is to serve it. On one side, we're going to talk about DeepSeek themselves. They now have a chat app that got to number one on the App Store. Disclaimer, number one on the App Store is measured by velocity.