Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this actually gives you a ballpark, which is like remarkably accurate to practice.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Generally, people will go a little bit larger than this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

They don't really want to be exactly at the balance point because real world efficiencies aren't as good as a roofline analysis would say.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But like take this and maybe double it or triple it.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we solve for the equivalence between when compute time is equal to memory time.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If I add in more memory bandwidth, like something that consumes more memory bandwidth, then I have less available for the weight loads, and so I need to grow the memory bandwidth more, and therefore the batch size more.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, okay.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I guess this is, keep in mind that I'm talking about the number of tokens that I'm generating one more token for.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So it's like, it's actually 2,000 unique sequences.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, we're just talking about...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The way to think about this, I guess we think of it as like, when does the train depart as a model?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's say I've picked a batch size that I'm going to run at.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Maybe I pick this batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so like, well, and by the way, this intersection point is the same intersection point here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I picked this batch size.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I know that it's going to take, for example, maybe it's something like 20 milliseconds is a common place to sense up landing.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

What I'm going to produce is, uh, like, so this is a timeline of what is running on the GPU.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's going to start a new batch every 20 milliseconds, uh, regardless.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so, uh, so, so each of this is 20, this is 40.