Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
When you think about how that applies to humans, we're terrible at exact recall.
Absolutely horrible.
I could tell you a sentence,
And tell you to repeat it?
Right, but like you get the gist of the sentence.
If I told you a whole paragraph, you'd get the gist of it.
You could repeat the meaning of it to someone.
You could translate that meaning.
So models, very different, right?
Fundamentally, transformer attention has been calculating the attention to everything to each other and getting the models to actually be able to recall.
That's been a training data problem, but like you can get the model to repeat exactly what you want anything in its context length.
It's like a needle in the haystack is the like, it's a benchmark that people did for a while because models had to get good at that.
But now models are just like amazing.
But what they really suck at is having infinite context.
The real word is sparse.
You've taken this entire world and you've encoded it in such a small amount of data that lives in your brain.
And it's so sparse, but you understood how to grab the fundamental reason and put it down there.
Whereas models, they haven't been able to create something sparse yet.
How do you reason over the context of
infinity.