Kwasi Ankomah
👤 PersonAppearances Over Time
Podcast Appearances
connect it to any cloud that's open AI compatible.
I think all the frameworks work with that.
The shock that you might have, and I think people are waking up to, is the cost that comes to you after you do that when you try and scale it.
That's the real kind of thing that people are struggling with here.
It's like, you know, how do we keep this within budget?
And again, the thing that I think people don't talk about enough is power.
I think that that is a real factor here in terms of like,
In order for this to actually become scalable, power is actually the thing that you need to make sure that you're controlling it for sure.
Yeah, I think, you know, there's definitely, I think there's...
I think there's both.
I think the chip is massive for sure.
Like you need to, but for us, we, one of the, the advantage of being full stack is that we control everything.
So, you know, we have some super smart people at every level and that the efficiency of the compiler, the runtime, the kind of hardware itself is massive.
But of course, I think the architecture of the chip is definitely a big one.
So if you can, if you can run these larger models,
at a smaller footprint by by default you will just manage to reduce it because the key thing that you need is like and to give you an example is how many chips do you need for model x right that's the reliance so how many chips do i need for deep sea because the number of chips will basically be how many how much power you need to draw to run inference for this group of people so i think you know the more that you can reduce that then it means that
One unit, so for instance, our SM40L has 16 chips.
So our kind of key metric is what can we fit on those 16 chips?
Because that footprint is what we call a rack.
So essentially, or Samba rack.