Dwarkesh Patel

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It gets me confused about this.

6359.079 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Length pass is the... It seems like this should be higher when you're doing pre-fill.

6360.341 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Pre-fill has a bigger length pass, yeah.

6365.831 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Right.

6367.694 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, yeah, let me think about this then.

6389.316 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, so let's do one line for... Basically, we'll have four different lines.

6391.358 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Let's do the... Let's do pre-fill first, and so... Actually, let's do decode first.

6397.285 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That makes sense.

6415.73 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Okay, getting back to it.

6416.771 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So t-compute, if you have basically just this divided by length pass, so just this amount.

6417.692 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this actually does not vary based on t, so it'll just be some flat value like this.

6423.858 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And this is t-compute.

6431.045 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then this is like... This is... That's decode.

6438.738 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Decode, right.

6443.124 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Now, tmem, if you have this whole thing divided by length pass, well, it doesn't really matter what's up there.

6444.946 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It'll just be something that looks like this.

6450.633 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Right.

6454.659 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

6455.28 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Let's say this is tmem.

6455.4 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is decode again.

6460.667 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment