Logan Kilpatrick
๐ค SpeakerAppearances Over Time
Podcast Appearances
Um, so I think, and obviously we're still early in that domain, but I think it'll be cool to see how much people are accelerated when you like 10 X, the context window or a hundred X, the context window and things like that in the future.
Um, and it's, it's very distinct from rag in a lot of ways.
Like I think you.
If folks are have gone into the weeds of, you know, rag versus long context, it really is a different it is a fundamental trade off that you're making.
So I'll be interested to see people not have to make that trade off and in cases where their use case would support it.
There's a bunch of like architectural challenges, like LLMs in the current form are not designed to scale up to the 10 to 100 million token context window.
Like it's really tough.
Like you could do some hacks to sort of
get slightly farther.
And like, we did show a bunch of like research, um, of what it would look like to bring 10 million to people.
And even with the original Gemini launch showed some of that in, in like practice and production environments, it becomes extremely like very, very, very costly.
Um, and like not easy to maintain and continue to scale up.
So I do think we'll need some like architectural, uh, innovation at the model level in order to enable things like a hundred million tokens.
um, which I'm excited about and I think the world needs.
So I'm, I'm hopeful we'll keep pushing the rock up the hill.
What's a hundred million token use case.
Yeah.
I mean, some of these code bases is actually a good example of like, if you look at like a large company and if you're using a mono repo, really interesting to see, like, uh,
like you probably maybe a hundred million tokens is like too much or is like slightly on the extreme of this, but like accumulated through your lifetime, you actually do have a lot of this data.
I think the challenge then becomes like, how do you, and like the attention mechanism in language models and transformers specifically doesn't have this intrinsically in it, but like, how do you up sample the right data and down sample the wrong data, all that stuff.