Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, so this is like, why is this correct divided this way?

4017.766 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Well, we're saying, we knew that the parameters were perfectly divided amongst all the GPUs in a rack.

4020.713 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The layers are perfectly divided amongst the different racks.

4029.272 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So that works here.

4033.602 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And somehow we're going to arrange, I'll hand wave exactly how, somehow we can arrange the same perfect sharding of the contexts across GPUs in a rack and based on layer across racks.

4034.725 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And sorry, four is the number of racks.

4047.511 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, for example.

4050.297 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So...

4053.543 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is the place where we actually need to go back and analyze this batch size B. And you were making this comment that there's micro-batching versus global batching.

4056.123 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So let's come back to this pipelining diagram here.

4063.251 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We've got one batch going forward here.

4068.337 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then as I drew it, it kind of just like disappeared.