Andy Halliday
๐ค SpeakerAppearances Over Time
Podcast Appearances
So all of those things are improving on the efficiency scale.
The second dimension has to do with memory.
And this is where the new deep seek technique comes in.
We know that models just left as a dense model and being injected with your prompt and some additional context that you type in at the time of inference, they can be subject to hallucinations.
And so we like to ground that with a retrieval augmented generation model where you have an external memory, a database that is going to be referenced as context based.
And semantic relevance is used to selectively retrieve the relevant components of the grounding truth data that's in that retrieval augmented generation, typically a vector database, in order to achieve that semantic retrieval.
So that's this outboard memory that's used to inform the inference process in an efficient way.
So those are the two big dimensions of advancement.
One has to do with sparsity and the other has to do with memory.
Okay, so memory being like external caching or placement of and then retrieval of static knowledge that doesn't really change and isn't subject to reasoning and manipulation by the computational process in inference.
Okay, so what did DeepSeek do?
DeepSeq introduced this thing called Ngram.
It's a novel module that's added to their LLM that provides conditional memory.
And here's the jargon.
It's a complementary axis of sparsity.
that adds to the conditional computation paradigm of mixture of experts models in large language models.
So what is this doing?
What it's doing is it's efficiently identifying the things that are static knowledge in the input context and putting those into a file, in effect, that's sort of a scratchpad kind of memory.
And this then frees up.