Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so this is like, why is this correct divided this way?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Well, we're saying, we knew that the parameters were perfectly divided amongst all the GPUs in a rack.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The layers are perfectly divided amongst the different racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So that works here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And somehow we're going to arrange, I'll hand wave exactly how, somehow we can arrange the same perfect sharding of the contexts across GPUs in a rack and based on layer across racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And sorry, four is the number of racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, for example.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is the place where we actually need to go back and analyze this batch size B. And you were making this comment that there's micro-batching versus global batching.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's come back to this pipelining diagram here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We've got one batch going forward here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then as I drew it, it kind of just like disappeared.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That's not really correct.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If you think about how decode is working, I have a bunch of tokens that I have generated already.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I do one forwards pass where I generate a new token.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then, and then I push, like, then I write that to my KB cache and then I do another forwards pass that generates the next token.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I'm actually going to be running this batch zero in a loop.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So in fact, I go forwards.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Once I finish, I can start the next iteration of the loop up here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.