Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

is equal to scale up size,

4688.345 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

memory bandwidth per gpu yeah yeah times gpu bandwidth um uh and so this term doesn't increase a lot it maybe increases 1.5 or 2x per generation but this one increased by like a factor of eight um from these problems so the reason the bigger scale of matter is not the memory capacity of the whole scale scale up but really the memory bandwidth yeah yeah pipelining totally solves the capacity problem but um but uh uh scale up size helps solve the bandwidth problem

4692.867 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

4726.85 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It lets you just run the model at lower latency as a first thing.

4727.29 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If I just do a very sparse model and it's on a little H100 box, the latency will be really high.

4731.496 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

4738.866 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is a place where we have to do a bit of guesswork because like the updated scaling laws and the model traffics are not reported.

4813.017 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so we have to guess there.

4820.306 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But one way to look at it, let me first just make a sort of a general heuristic claim.

4823.25 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If I have some like cost and I've got a total cost, which is a sum of like cost A and cost B,

4833.263 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like maybe this is the training cost and this is the inference cost.

4842.025 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so I want to minimize this sum.

4845.028 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

For many curves that tend up being the case, the minimum tends to be where the costs are equalized.

4848.791 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That's something of a heuristic claim, but there are many examples where it's true, like where one is 1 over x and the other one is x, for example.

4857.419 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

They tend to be minimized at the point where they equal each other.

4867.348 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's also true for e to the x and e to the minus x and all kinds of other things.

4872.012 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So basically, I've got some curve that's going down, some other curve that's going up, and they tend to be minimized at this equal point.

4880.053 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Heuristically, I will conjecture that that is true for the setup you described as well.

4890.338 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Actually showing that that would be true would require looking at the scaling laws and fitting these weird exponents.

4897.747 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But things that do follow power laws tend to have this property.

4905.877 View full episode →

← Previous Page 36 of 58 Next →

Report any issue