Tim Davis
๐ค SpeakerAppearances Over Time
Podcast Appearances
And there's methodology behind that, and I can spend time talking about it.
Yeah.
But fundamentally, the goal was enabling our engineering teams to be able to write high-performance kernels and low-level operations that could scale when AI models execute across different types of silicon.
So we started building that, and we also started building it in a way that was heterogeneous.
So what I mean by that is we believed from our infrastructure work at Google that the future was going to be lots of hardware interacting and different types of hardware interacting with each other.
Part of this was from, like I talked about, the mobile experience.
You have four different types of architectures on a mobile phone.
Ideally, you want them all working together.
You want them all humming together.
Well, again, if you don't have a programming model that can actually program all four different types of accelerators, that becomes really hard.
And so we said, well, let's create this new programming model that makes it easier to actually program, for example, a CPU and a GPU together.
If we could do that, that's like stage one to getting more utilization, more efficiency out of the hardware.
And the big challenge in doing that was,
cool, but could we meet the performance of someone like NVIDIA on their own silicon?
Is that possible?
Because if we couldn't achieve that, from a commercial standpoint, from a business standpoint, no one in the world is going to be like, wow, I love your idea, but wait, I lose dollar per token per watt.
100%.
It's just not possible.
So we started by doing that and that's low level.
It typically is for more advanced programmers.