Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
And Confessions is an interesting strategy.
We've seen it, I think, in the.
in the research that's been done by alignment teams where they actually have the model do this sort of sidebar discussion of what their internal thinking process is and providing some more transparency to what's going on in the deliberations that the model's using to arrive at a certain response.
I hope, I hope we stay ahead of them because they, it was back to the first point that we launched with, which is, wow, you know, these things are going to be far, far beyond our comprehension in their capabilities and speed and power and grasp.
Like the, the,
the combination of their expansive memory and their computational speed and the refinement of their intelligence, it really makes you worry about making sure that they are in alignment with us.
Yeah, and then let's fast forward to when quantum computing is involved, and we don't understand how it's getting to that level of understanding.
So I want to spin off of just your reference to mixture of experts, because I have a little bit of news here about that and a discussion.
So almost all of the major models now are being designed as what was called sparse mixture of experts, MOE.
And the term sparse or sparsity in AI refers to
Selective activation of different components of the deep neural network.
So you're not activating the entire thing.
A dense deep neural network is one that sends every token through every layer and calculates every relationship to every other token in the thing.
And that's the dense model.
A sparse...
model is one that sends the tokens only through a certain expert domain that's been selectively identified as having a particular pre-training or it's relevant to that particular query that's coming through.
So the reason this architecture is being used is we're trying to reduce the cost of inference and the consumption of energy in the process of doing inference.
So a mixture of experts model is the right way to go because, say, you have a trillion parameter deep neural network, well, it's only activating 200 billion at a time, for example.