Dwarkesh Patel

Reiner Pope – The math behind how LLMs are trained and served

When you're operating within a single scale-up domain, is that a consideration specifically for either forward or backward?

2831.936 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Or specifically for pre-fill versus decode?

2841.869 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Or is it preferred to always be within a scale-up

2847.036 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Whatever kind of workload you have, whether you're doing a pre-training run or whether you're doing RLL generation or whether you're doing inference for users.

2852.345 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Can I try to guess?

3008.172 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Just out of curiosity to see if I'm actually understanding.

3008.974 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It seems like you're sending batch size into the rack.

3011.399 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

In here?

3017.045 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yes.

3017.666 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But the communication within a rack is sort of batch size times number of GPUs.

3019.668 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And there's a need to multiply the whole thing by two for the up and down.

3149.129 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And there's a factor of two.

3153.414 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's interesting to me that the best parallelism

3201.026 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

strategy and practice ends up being one which physically resembles the actual architecture.

3204.728 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's not some galaxy brain thing.

3212.606 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

You know, it's like, oh, we have experts.

3214.551 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We're going to put them on different GPUs.

3215.914 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Oh, we have different layers.

3217.217 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We're going to put them on different racks.

3217.919 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I mean, it could have been something wackier with tensor parallelism and whatever.

3227.508 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment