Noam Shazeer
๐ค SpeakerAppearances Over Time
Podcast Appearances
And that's not to say we should exactly mimic that because
silicon and, you know, wetware have very different characteristics and strengths.
But I do think one thing we could draw inspiration, more inspiration from is
this notion of having different specialized portions, sort of areas of a model of a brain that are good at different things.
So we have a little bit of that in mixture of experts models, but it's still very kind of structured.
And I feel like this kind of more organic growth of expertise, and when you want more expertise of that, you kind of add some more capacity to the model there and let it learn a bit more on that kind of thing.
And also this notion of like adapting,
the connectivity of the model to the connectivity of the hardware is a good one.
So I think you want incredibly dense connections between artificial neurons in sort of the same chip and the same HBM, because that doesn't cost you that much.
But then you want a smaller number of connections to nearby neurons.
So like a chip away, you should have some amount of connections.
And then like many, many chips away, you should have a smaller number of connections where you send over a very limited kind of bottlenecky thing, the most important things that this part of the model is learning for other parts of the model to make use of.
And even across multiple TPU pods, you'd like to send even less information, but the most salient kind of representations.
And then across metro areas, you'd like to send even less.
Yeah, I'd like that to emerge organically.
Like you could hand specify these characteristics, but I think you don't know exactly what the right proportions of these kinds of connections.
And so you should just let the hardware dictate things a little bit.
Like if you're communicating over here and this data always shows up really early, you should add some more connections.
Then it'll make it take longer and show up at just the right time.
I think there's a notion of like how much compute do you want to spend on this particular inference.