Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

πŸ‘€ Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

That gives you exactly this number.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Or you could have, like, fewer kv-heads but more layers.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So this is one way to get there via dense attention.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

There's also a way to get there via sparse attention where you increase all of these numbers, but then you have like a line of a sparsity term.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So yeah, I mean, I think this number is plausible if maybe a little bit small.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I mean, you are incentivized to price close to your costs because otherwise someone could script you.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I don't remember.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

What I've seen in the past is like three or five times more expensive.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

That makes more sense.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

If we say like, if we can think of decode as being a pass with one and then pre-fill being a pass with many.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

I think maybe sort of let's draw actually how pre-fill shows up here.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

If I may clarify, so we do a bit of decode like this.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

We may actually come back and do more pre-fill.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Like if you think this is a chat session, the user says something, the AI generates response, and then the user says something else when we pre-fill this.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So like maybe this is the more common, like this is the general case rather than this.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Read a file or just like the AI is responding to a user input or a tool call or anything that's not AI generated.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah, exactly.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Yeah, there's actually no adjustment at all to the memory time.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So, yeah, there is the time for one pass, but actually the amount of tokens is that much larger.