Yannis Antonoglou
๐ค SpeakerAppearances Over Time
Podcast Appearances
I'll just learn something.
And then by just mixing these experts,
Yeah, by mixing these experts, you just like get more powerful outcome.
Now the whole idea is that you can really have like more weights.
You can have like more parameters during training that you can actually use to feed your data.
But during inference, there's like only a number of active parameters that you care about.
So like you don't have to run inference across like
massive model of like trillion parameters right like because the active number of parameters is like what really matters for inference and this is this tends to be like much smaller it could be like 32b or like 256b or something like that when a model is like 1 trillion so like the super sparse experts is like when you have like many many many many many many experts where each one of them has like a
a significantly smaller number of parameters.
So inference is much faster, and at the same time, you have a lot of capacity in your model.
Yeah, so...
We'll actually release the model architecture.
We'll just release the weights.
What that means is that people can take the model, they can run it in their own hardware, wherever they want.
They can actually, if they have data, they can use this data to do fine tuning.
If they have an environment, they can just run their own reinforcement learning, RFT, on the model, just take specialized models out of that.
So you have full ownership of how you run the model, where you run the model,
what you do with the model in terms of customization.
So this is kind of important for people who care about safety, they care about security, they care about customization, and they ultimately care about ownership.
But what also this allows you to do, and this is especially important for researchers,