Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Yannis Antonoglou

๐Ÿ‘ค Speaker
329 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

Yeah, this is a really technical difference between what we mean by dense models and what we mean by MOEs and how MOEs might sound like much larger models, but at the same time, they're quite efficient when it comes to inference.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

So at the heart of it, you can actually think of a mixture of experts model as like many, many, many models kind of like put right next to each other.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

And then there's like what we call a router, which is really a system that selects one of the models to route each forward pass.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

So when I have a dense model, I guess attention is the idea of looking at every token and try to predict based on every token in the past, try to predict the next token.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

And there's like an everything-to-everything kind of like connection.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

So like this is like the fully connected dense models.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

In the mixture of experts, you have like many of these models, like one right next to the next, one right next to the other, and a system that selects during inference, during runtime, which one of the model to route its path to.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

No, no, no, that's not correct actually.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

Because GPT-4, GPT-4.5 could also be a mixture of expert models.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

It's more of the architecture.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

So the architecture is that as if you just took and you trained many small LLMs and then you just kind of put them together and then just trained a system that can route between them.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

And then this way, kind of like the actual models, like many, many experts contributing to the final output.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

And, you know, you actually like do that, like you don't do it, you don't have like actual full models.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

You just like have them interspersed, right?

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

Like as in you have like layers of like experts and they kind of like contribute to the same final output.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

Yeah, so it's kind of like, you can think of it as like a path, a path that like a model can take to kind of like use different experts along the way to just kind of like produce the final output.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

No, it's actually more at the level of the token that I'll just do that.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

So it's kind of like the mix of experts.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

I guess the experts is the idea that each expert will just learn something on the data, but it's not at the level of that abstraction.

The Neuron: AI Explained
This DeepMind Vet Raised $2B to Open-Source Frontier AI

It's more at the lower level of abstraction.