Jeff Dean
👤 PersonAppearances Over Time
Podcast Appearances
It could be a kilobyte, a megabyte of memory per token.
Yes.
Yes.
So there's actually a lot of innovation going on around, OK, A, how do you minimize that?
And B, OK, what words do you need to have there?
Are there better ways of accessing bits of that information?
You know, Jeff seems like the right person to figure this out.
Like, OK, what does our memory hierarchy look like from the SRAM all the way up to data center worldwide level?
Well, I mean, I assume we will have these models a lot better and hopefully be able to be much, much more productive.
It might be kind of similar to what we have now because we already have sort of parallelization as a major issue.
Because, you know, we have like lots and lots of really, really brilliant machine learning researchers and we want them to all work together and build AI.
You know, so...
Actually, the parallelization among people might be similar to parallelization among machines.
But I think definitely it should be good for things that require a lot of exploration, like come up with the next breakthrough.
If you have a brilliant idea that's just certain to work in the ML domain, then it has a 2% chance of working if you're brilliant.
And mostly these things fail.
But if you try 100 things or 1,000 things or a million things, then you might hit on something amazing.
And we have plenty of...
Compute, like a modern, you know, top labs these days have probably a million times as much compute as it took to train Transformer.
Maybe.