Reiner Pope

If I add in more memory bandwidth, like something that consumes more memory bandwidth, then I have less available for the weight loads, and so I need to grow the memory bandwidth more, and therefore the batch size more.

1215.858 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, okay.

1230.856 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I guess this is, keep in mind that I'm talking about the number of tokens that I'm generating one more token for.

1232.237 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So it's like, it's actually 2,000 unique sequences.

1238.605 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, we're just talking about...

1243.09 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

1341.94 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The way to think about this, I guess we think of it as like, when does the train depart as a model?

1343.142 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's say I've picked a batch size that I'm going to run at.

1347.05 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Maybe I pick this batch size.

1350.097 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so like, well, and by the way, this intersection point is the same intersection point here.

1352.622 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I picked this batch size.

1359.616 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I know that it's going to take, for example, maybe it's something like 20 milliseconds is a common place to sense up landing.

1361.398 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

What I'm going to produce is, uh, like, so this is a timeline of what is running on the GPU.

1367.065 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's going to start a new batch every 20 milliseconds, uh, regardless.

1372.712 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so, uh, so, so each of this is 20, this is 40.

1377.017 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment