Reiner Pope

Reiner Pope – The math behind how LLMs are trained and served

Like we have to imagine dividing each of these three curves by B, so multiplying by this reciprocal.

Reiner Pope – The math behind how LLMs are trained and served

And so what we end up with there is the compute curve is gonna, it was linear, we divide by B, that makes it a constant here.

826.783 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is T compute.

837.773 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The KV fetch was linear, now it becomes a constant as well.

843.798 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then the weight fetch was constant, and now we're divided by b, and so it becomes this hyperbola.

848.371 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so again, we're going to compute the max of the sum.

871.614 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So the sum of these two terms shifts the parabola up.

877.97 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Sum of the kbfetch and the weightfetch gives us a higher parabola that's like this.

882.899 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we're going to take the max with the compute here.

890.713 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we end up with this being the overall shape that we care about.

895.02 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So again, we see some limiting behavior.

901.983 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The cost initially starts very high at batch size of one.

904.447 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Actually, it almost goes to infinity.

908.092 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's because we've got so many weight fetches which are not amortized over a large batch size.

910.015 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But then as we increase the batch size, the weight fetches become amortized over so many different batch elements that their cost grows very small.

916.584 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And eventually, the compute time ends up driving the cost.

924.376 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So there is a limiting like lower bound on cost

928.139 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

which is this one here.

938.585 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, they're unique per batch.

953.199 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The compute is also unique per batch.

954.72 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment