Reiner Pope

Reiner Pope – The math behind how LLMs are trained and served

Thank you.

Reiner Pope – The math behind how LLMs are trained and served

So what we would like is for the scale-up time to be greater than the scale-out time, because the scale-up time is the more important and precious resource.

3157.799 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so we would like this number to be greater than or equal to 1.

3167.613 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And this really doesn't seem hard.

3171.059 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's just a factor of 8 that we need to overcome.

3174.323 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So we need the product of these three things to be bigger than 8.

3176.446 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Typically, we have a fairly large number of activated experts.

3180.192 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It could be eight by itself.

3182.576 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then we can increase the number of layers per stage a lot until we satisfy this.

3184.779 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I see.

3189.227 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So what this ends up looking like is that I can, in fact, have an entire pipeline of racks where one rack does one layer, and then I move on to the next rack, and I do another layer, and then I move on to the next rack.

3190.909 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I can do another layer.

3199.804 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Isn't that, I feel that's interesting that the physical and... The model architecture matches, like the cutting matches the model architecture.

3219.697 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, exactly.

3226.507 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

3227.128 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

3230.293 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, I mean, I think a way to think of it is, I mean, okay, the galaxy brain way to think of it is...

3230.974 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like, what are all the different dimensions in which a model is scaled up?

3238.124 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so it is scaled up by layers, it is scaled up by the demodeled dimension, it is scaled up by the DFF dimension, it is scaled up by the number of experts.

3242.208 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Every single one of those numbers you can choose to cut along.

3251.938 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment