Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And if those numbers are big enough, it eventually becomes profitable to cut along there.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And we have selected two of them.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The other two, in the way models are typically sized, are not profitable.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And I guess we didn't really fully articulate even what is the benefit that we're getting from pipelining.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so these complexities are real.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Pipelining is a massive hassle, but it does give you some benefits.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then you can then decide whether those benefits are worth the costs.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The biggest benefit that shows up, so it has some benefits in inference, maybe bigger benefits in training.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In inference, what are we saving on?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Are we saving on memory time or compute time?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Not really.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

We're just moving the memory time from one chip to another chip or one rack to a different rack.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There's no actual benefit in runtime.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

However, what we are saving on is that the memory capacity is the amount of memory used per rack.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If we think that the memory in a rack is a bottleneck, then there's a constraint on how fast we can go.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Pipelining allows us to massively reduce that bottleneck.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So we draw the pipeline bubble.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.