Reiner Pope
๐ค SpeakerAppearances Over Time
Podcast Appearances
And if those numbers are big enough, it eventually becomes profitable to cut along there.
And we have selected two of them.
The other two, in the way models are typically sized, are not profitable.
Yeah.
And I guess we didn't really fully articulate even what is the benefit that we're getting from pipelining.
Yeah.
And so these complexities are real.
Pipelining is a massive hassle, but it does give you some benefits.
And then you can then decide whether those benefits are worth the costs.
The biggest benefit that shows up, so it has some benefits in inference, maybe bigger benefits in training.
In inference, what are we saving on?
Are we saving on memory time or compute time?
Not really.
We're just moving the memory time from one chip to another chip or one rack to a different rack.
There's no actual benefit in runtime.
However, what we are saving on is that the memory capacity is the amount of memory used per rack.
If we think that the memory in a rack is a bottleneck, then there's a constraint on how fast we can go.
Pipelining allows us to massively reduce that bottleneck.
So we draw the pipeline bubble.
Yeah.