Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, there's sort of two scenarios.

1517.68 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Why don't we pick a latency that is bigger than 15 milliseconds?

1518.982 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And if I think what that means, it means I actually have time to read the HBM like twice.

1522.348 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yep.

1528.559 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

By the way, most of HPM accesses is reads, not writes.

1528.619 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It's like, almost all reads, because the weight matrices are read-only, and then almost all of the KV cache accesses are reads.

1531.427 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, let's say I run 30 milliseconds, I can read all of HPM twice.

1537.365 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But what's the point of that?

1542.473 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like, I don't want to read the weight matrices twice.

1544.296 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I don't want to read the KVs twice.

1547.18 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I mean, sparsity shows up in model size, but beyond that, it only depends on sparsity, not on scale.

1562.164 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

1579.772 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

We can do a bit of analysis on this, which would be actually, it's like, you can think of it in terms of number of users, but maybe a more productive way to think of it is in terms of number of tokens per second.

1585.44 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So what does this batch size mean in terms of tokens per second of the system?

1594.372 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So...

1598.858 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

tokens per second tokens per second is going to be equal to the batch size we run a batch many tokens and then we do that every um t so every time it tools which is let's say which is uh which is this thing is equal to the 15 milliseconds 20 milliseconds number so this ends up being batch size itself times

1599.359 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Uh, about 60, so, um, like 64 times B. Um, and so this ends up being around, uh, 2000 times 64.

1623.085 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So like 128, um, 128K, uh, token specific.

1633.002 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this is sort of in more digestible units.

1638.532 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like, uh, it's hard to reason about concurrent users, but what is the global traffic for, for a system?

1640.956 View full episode →

← Previous Page 12 of 58 Next →

Report any issue