Kwasi Ankomah
👤 PersonAppearances Over Time
Podcast Appearances
How much is it costing?
And I think this is what we're talking about.
Everyone is really, and this is what we all keep talking about, inference and efficiency and power.
How much is this costing?
And we, our architectures, we come from the fundamental thing of saying, we know at scale how much it's going to cost.
So we've already kind of thought about that.
And the folks who do the hardware really kind of thought about that.
For sure.
So I think like, you know, I'll talk about probably the key two areas, right?
So one of them is around the chip itself, right?
So we've done something, what we call three-tier memory.
So that three-tier memory is basically SRAM and that's like super fast, you know, on chip integrated with the compute.
We've got HVM and then we've got DDR.
So the HPM is the intelligent caching layer, and then we've got the DDR, which is like this massive capacity, like 1.5 terabytes that we've got as well.
So why does this matter?
It's because traditional GPU move data between memory and compute.
So it's like, you're constantly doing this back and forth, back and forth.
So it's like having your tools in the garage when you're working in the kitchen.
You have to constantly go, oh, yeah, you know, I need a screwdriver.
Now, like that, that's inefficient.