Andrew Feldman
๐ค SpeakerAppearances Over Time
Podcast Appearances
All right, then you gotta think, I mean, it is a constant.
All right, then you gotta think, I mean, it is a constant.
So usually your most senior architects, your CTO and your technical leads are thinking about that as the basis of a design.
So usually your most senior architects, your CTO and your technical leads are thinking about that as the basis of a design.
It doesn't matter how fast a car can go if it can't turn.
It doesn't matter how fast a car can go if it can't turn.
It's not a good car, except maybe for drag racing.
It's not a good car, except maybe for drag racing.
And so the designers are constantly thinking about where should we use power in the design?
And so the designers are constantly thinking about where should we use power in the design?
Can we make the memory faster?
Can we make the memory faster?
Can we add memory?
Can we add memory?
What is the cost of adding memory versus making it faster?
What is the cost of adding memory versus making it faster?
The GPU, for example, has a lot of capacity of memory, but it's really slow.
The GPU, for example, has a lot of capacity of memory, but it's really slow.
All right, and that's a huge bottleneck in inference.
All right, and that's a huge bottleneck in inference.