Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

When was GPT-4 released again?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It was 2022 or 2023?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Three.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And it was rumored to be over 1 trillion parameters.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And it seems like only now and within the last six months have models been getting released that are significantly more parameters than a model released three years ago.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

When supposedly there should have been this scaling in the meantime.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Is the reason that we were just waiting for RACs

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

with enough memory to hold a five trillion parameter model along with its kv cash for enough you know users for a full um for a lot of sequences or rl if you're doing rl kind of a similar consideration of actually holding the kv cash for all the the uh the the batch of problems you're trying to solve um so if you look at like hopper you had eight hoppers and i think the

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That's 640 gigabytes as of 2022.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

With Blackwell finally, which was deployed, what, 2020?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Very recently.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I mean, last year.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Last year?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

You finally have a scale up with on the order of like 10, 20 terabytes, which is enough for like a 5T model plus KB cache.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And that also explains why Gemini seemed to be ahead.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Gemini 2.5 was a successful, or it just seems like Gemini has that successful pre-train for longer than some of the other apps.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yep.