Dwarkesh Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
When was GPT-4 released again?
It was 2022 or 2023?
Three.
And it was rumored to be over 1 trillion parameters.
And it seems like only now and within the last six months have models been getting released that are significantly more parameters than a model released three years ago.
Yeah.
When supposedly there should have been this scaling in the meantime.
Is the reason that we were just waiting for RACs
with enough memory to hold a five trillion parameter model along with its kv cash for enough you know users for a full um for a lot of sequences or rl if you're doing rl kind of a similar consideration of actually holding the kv cash for all the the uh the the batch of problems you're trying to solve um so if you look at like hopper you had eight hoppers and i think the
That's 640 gigabytes as of 2022.
Yeah.
With Blackwell finally, which was deployed, what, 2020?
Very recently.
I mean, last year.
Last year?
Yeah.
You finally have a scale up with on the order of like 10, 20 terabytes, which is enough for like a 5T model plus KB cache.
And that also explains why Gemini seemed to be ahead.
Gemini 2.5 was a successful, or it just seems like Gemini has that successful pre-train for longer than some of the other apps.
Yep.