Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
It gets me confused about this.
Length pass is the... It seems like this should be higher when you're doing pre-fill.
Pre-fill has a bigger length pass, yeah.
Right.
Okay, yeah, let me think about this then.
Okay, so let's do one line for... Basically, we'll have four different lines.
Let's do the... Let's do pre-fill first, and so... Actually, let's do decode first.
That makes sense.
Okay, getting back to it.
So t-compute, if you have basically just this divided by length pass, so just this amount.
So this actually does not vary based on t, so it'll just be some flat value like this.
And this is t-compute.
And then this is like... This is... That's decode.
Decode, right.
Now, tmem, if you have this whole thing divided by length pass, well, it doesn't really matter what's up there.
It'll just be something that looks like this.
Right.
Yeah.
Let's say this is tmem.
This is decode again.