Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So if you can, I would really recommend watching it on a video platform like YouTube.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, full disclosure, I am an angel investor in Maddox, but that's unrelated to this podcast.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Reiner, maybe to kick us off, I'll ask this question.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, we have a couple of companies like Claude and Codex and Cursor offering something like Fast Mode, where for 6x the price, they'll stream you tokens at 2.5x the speed.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Mechanically, I'm curious what's going on here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Why is it the case that you can pay more to get faster latency?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And two, could you keep going?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Could you pay 100x more and somehow get even faster speeds or much, much faster speeds?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And three, could you go the other way?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Could you have something like cloud code slow mode where if you are willing to wait for minutes on end, you could get even cheaper prices?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So maybe this will help motivate the kind of analysis that you'll be doing through the lecture.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Maybe I'll just interrupt from time to time to ask some very naive questions or to clarify some basic points, but...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

just for the audience, you're not serving one user at a time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The batch refers to the fact that you're serving many different users at the same time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And that's a whole batch.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And maybe just back in, let's just explain what the KV cache is real quick.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It seems like the way you've drawn the slopes for...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

compute time and how the kb grows and what implication the kb has on memory uh time that as yeah what if this were above or below or yeah or is that necessarily the case because if this is always true then this batch size grows compute always dominates uh kb and which which suggests that if you have big enough batch size maybe memory is never an issue

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And is there something especially significant about the slope being exactly the slope of the...