Reiner Pope

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay.

3412.951 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So why do we do... What is this micro-batching that shows up in pipeline parallelism?

3413.732 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So...

3419.699 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I'll focus on inference first.

3421.792 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's a slightly simpler problem.

3424.455 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And I'm going to draw, so this is time, and then this is which rack we're on.

3425.997 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so the idea is that maybe I'll have four racks.

3436.41 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I've got an inference that is going to step through these four racks in some time like this.

3439.594 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So great, this is inference number zero.

3448.245 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It runs at a certain batch size, and it steps through all the pipeline stages like this.

3452.234 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now, if we were to say, well, we're going to run inference number one here, this is clearly a massive waste, right?

3457.84 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like three quarters of the time, each of the racks is doing nothing.

3465.328 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we don't actually run inference one here.

3468.851 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We run it as soon as we can, which is immediately after inference zero finishes like this.

3471.854 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we keep going.

3479.302 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So if we hadn't filled this in, we would call this the pipeline bubble.

3482.005 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

When I've drawn it in this inference context where we're only going in a forwards pass, it's like obvious, like why would you do this stupid thing?

3485.568 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But in a training context, it's maybe less obvious.

3491.875 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But in the inference context, it's sort of really natural to make this change.

3495.178 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, let's do that.

3534.501 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment