Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Whatever.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, memory.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So you're not storing the KV cash for the tokens that are the pre-filled tokens.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

In fact, this is like you read a file,

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, okay.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so suppose we're here.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, you will need to load... Basically, you will have calculated all of this previously.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, just the KV of everything that came before.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But what is the memory cost of this?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Well...

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

memory bandwidth cost of this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If you're doing flash attention, it would... Yeah, it's basically temporary.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It doesn't even go to main memory.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Just ignore it.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so then it would just be everything that came before.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, is it not just that then?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, great.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Oh, so it's a very trivial change to accommodate.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, this term is making it 5x more expensive.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Now, why would that be?