Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So why do we do... What is this micro-batching that shows up in pipeline parallelism?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I'll focus on inference first.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It's a slightly simpler problem.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And I'm going to draw, so this is time, and then this is which rack we're on.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so the idea is that maybe I'll have four racks.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I've got an inference that is going to step through these four racks in some time like this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So great, this is inference number zero.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It runs at a certain batch size, and it steps through all the pipeline stages like this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Now, if we were to say, well, we're going to run inference number one here, this is clearly a massive waste, right?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like three quarters of the time, each of the racks is doing nothing.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we don't actually run inference one here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We run it as soon as we can, which is immediately after inference zero finishes like this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then we keep going.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So if we hadn't filled this in, we would call this the pipeline bubble.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

When I've drawn it in this inference context where we're only going in a forwards pass, it's like obvious, like why would you do this stupid thing?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But in a training context, it's maybe less obvious.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But in the inference context, it's sort of really natural to make this change.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, let's do that.