Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
And some of the models will tell you actually how many...
experts are built in there it's an an moe with eight you know there's eight different modules if you will that are built in here that are selectively identifiable and you know inference can be sent to that one while all the other ones are zeroed out yeah okay so nvidia
is the company that's providing the chips to support this kind of thing.
And their latest 72 GPU server is delivering 10x performance when applied to mixture of experts models.
And because this latest super server that they've built is optimized for the routing that MOE models require, right?
So there's a stage that has to decide where to send, which are the experts to send this to in order to turn off the rest of the model in order to save the energy that would otherwise be occupied by sending all of the tokens to every part of the model.
So what that means is faster and cheaper inference at scale.
So if, you know, MOE is the way that, you know, the LLM or deep neural network types are moving and all of them are exploiting it already or now.
And NVIDIA is shaping the hardware to make that more efficient.
I get your point.
And I hope that somebody's building that to make it better.
I'll get right on it.
I hope that it'll be built into Apple intelligence so our devices themselves do that kind of testing.
No, I haven't seen it.
That's not nothing.
I want to know who they are.
Let's have a party.
Yeah, we're watching on Spotify right now.
Gareth Hood says, join the Slack, you 300.