Noam Shazeer
👤 PersonAppearances Over Time
Podcast Appearances
It's very helpful to have a 1,000th scale problem and then vet 100,000 ideas on that and then scale up the ones that seem promising.
I mean, I think one thing people should be aware of is the improvements from generation to generation of these models often are partially driven by hardware and larger scale, but equally and perhaps even more so driven by major algorithmic improvements and major changes in the model architecture and the training data mix and so on that really make the model better per flop that is applied to the model.
So I think that's a good realization.
And then I think if we have automated exploration of ideas, we'll be able to vet a lot more ideas and bring them into kind of the actual, you know, production training for next generations of these models.
And that's going to be really helpful because that's sort of what we're currently doing with a lot of machine learning research.
Brilliant machine learning researchers is looking at lots of ideas.
you know, winnowing ones that seem to, to work well at small scale, seeing if they work well at medium scale, bringing them into larger scale experiments, and then like settling on like adding a whole bunch of new and interesting things to the, to the final model recipe.
Um, and then I think if we can do that, you know, a hundred times faster through, uh,
those machine learning researchers just gently steering a more automated search process rather than sort of hand babysitting lots of experiments themselves, that's gonna be really, really good.
For that, more hardware is a good solution and better hardware.
Yeah, I mean, I've been pretty excited lately about how could we dramatically speed up the chip design process.
Because as we were talking earlier, the current way in which you design a chip takes you roughly 18 months to go from we should build a chip to something that you then hand over to TSMC and then TSMC takes four months to fab it and then you get it back and you put it in your data centers.
So that's a pretty lengthy cycle
And the fab time in there is a pretty small portion of it today.
But if you could make that the dominant portion so that instead of taking 12 to 18 months to design the chip, and with 150 people, you could shrink that to a few people with a much more automated search process, exploring the whole design space of chips and getting feedback from all aspects of the chip design process
for the kind of choices that the system is trying to explore at the high level, then I think you could get, you know, perhaps much more exploration and more rapid design of something that you actually want to give to a fab.
And that would be great because you can shrink that time, you can shrink the deployment time by kind of designing the hardware in the right way so that you just get the chips back and you just plug them in to some system.
And that will then, I think, enable a lot more specialization.
It will enable a shorter timeframe for the hardware design so that you don't have to look out quite as far into what kind of ML algorithms would be interesting.
Instead, it's like you're looking at six to nine months from now, what should it be, rather than two, two and a half years.