Noam Shazeer
๐ค SpeakerAppearances Over Time
Podcast Appearances
And there's a lot of inference compute you want.
So you want extremely efficient hardware for inference for models you care about.
I mean, I would add on to that.
I'm not sure I agree completely, but it's a pretty interesting thought experiment to go in that direction.
And even if you get partway there, it's definitely going to be a lot of compute.
And this is why it's super important to have as cheap a hardware platform for using these models and applying them to problems that Noam described.
so that you can then make it accessible to everyone in some form and have as low a cost for access to these capabilities as you possibly can.
And I think that's achievable by focusing on hardware and model co-design kinds of things.
We should be able to make these things much, much more efficient than they are today.
I'm not going to comment on our future capital spending because our CEO and CFO would prefer I don't probably.
But I will say, you know, you can look at our past capital expenditures over the last few years and see that we're definitely investing in this area because we think it's important.
And that we're, you know, we're growing.
continuing to build new and interesting innovative hardware that we think really helps us have an edge in deploying these systems to more and more people, both training them and also how do we make them usable by people for inference.
Yeah, I've been thinking about this more and more.
And I've been a big fan of models that are sparse because I think you want different parts of the model to be good at different things.
And we have, you know, our Gemini 1.5 Pro model and other models are mixture of expert style models where you now have...
parts of the model that are activated for for some token and parts that are not activated at all because you decided this is a math oriented thing and this part's good at math and this part's good at like understanding cat images so um
So that gives you this ability to have a much more capable model that's still quite efficient at inference time because it has very large capacity, but you activate a small part of it.
But I think the current problem, well, one limitation of what we're doing today is it's still a very regular structure where each of the experts is kind of the same size.
You know, the paths kind of merge back together very fast.