Noam Shazeer
👤 PersonAppearances Over Time
Podcast Appearances
any information of the world should be usable by anyone regardless of what language I speak.
Yeah.
And that I think, you know, we've done some amount of, but it's not nearly the full vision of, you know, no matter what language you speak out of thousands of languages, we can make any piece of content available to you and you make,
make it usable by you.
And, you know, any video could be watched in any language.
I think that would be pretty awesome.
And, you know, we're not quite there yet, but that's definitely things I see on the horizon that should be possible.
Yeah, maybe I'll take a first stab at it.
I mean, because I've thought about this for a bit.
I mean, I think one of the things you see with these models is they're quite good, but they do hallucinate and have factuality issues sometimes.
And part of that is you've trained on, say, tens of trillions of tokens, and you've stirred all that together in your tens or hundreds of billions of parameters.
but it's all a bit squishy because you've churned all these tokens together.
And so the model has a reasonably clear view of that data, but it sometimes gets confused and will give the wrong date for something.
Whereas information in the context window, in the input of the model, is like really sharp and clear.
Because we have this really nice attention mechanism in Transformers that the model can pay attention to things and it knows kind of the exact text or the exact frames of the video or audio or whatever that it's processing.
And so right now we have a...
models that can deal with kind of millions of tokens of context, which is quite a lot.
It's like, you know, hundreds of pages of a PDF or, you know, 50 research papers or, you know, hours of video or tens of hours of audio or some combination of those things, which is pretty cool.
But it would be really nice if the model could attend to trillions of tokens, right?
Could it attend to the entire internet and find the right stuff for you?