Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

1.3b dense yeah across you end up seeing that for example in this case the 64 expert 370 million activated parameters model is as good as a dense 1.3 billion model so in some sense it's actually not amazing returns where you need to increase total parameters a hundredfold to get the equivalent of

1789.405 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, I mean, actually, even more so, yeah.

1813.153 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's a huge increase in parameter count for a modest increase in... Yeah, so in this case, actually, what is it, 4x?

1814.936 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

64x for 4x.

1821.385 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So is that good or bad, actually?

1850.948 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Even from a memory point of view, keep in mind you are doubling this portion of the memory fetches, which is amortized by batch.

1853.13 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so just keep running out larger batch size.

1861.938 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

From the point of view of the analysis we've done here, this is pure win.

1866.462 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Keep doing it.

1869.945 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Keep doing it until you run out of available users, basically.

1871.567 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So there's actually this equivalence between...

1876.391 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If I want to go sparse, or if I have a lot of users, I can go to a much sparser model.

1881.121 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So from that point of view, it's a reasonable trade-off.

1885.407 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The other trade-off that shows up here is that it also consumes memory capacity, which we've only reasoned about memory-bound with it, but it also consumes memory capacity.

1888.191 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, so, I mean, maybe this would be a good point to actually talk about how a mixture of experts layer is typically layered out on a rack of GPUs or something like that.

1918.102 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, yeah, makes sense.

1928.097 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, where were we?

1929.679 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Sparse mixture of experts.

1931.522 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yes.

1932.543 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Maybe how we lay that out on a GPU.

1933.224 View full episode →

← Previous Page 14 of 58 Next →

Report any issue