Noam Shazeer
👤 PersonAppearances Over Time
Podcast Appearances
That seems like a nice trade-off to have because sometimes you
I want to think really hard because this is a super important problem.
Sometimes you probably don't want to spend enormous amounts of compute to compute, you know, what's the answer to 1 plus 1?
Maybe the system should decide to use it.
You should decide to use a calculator tool or something instead of, you know, a very large language model.
I mean, I think we do see some examples in our own sort of experimental work of things where if you apply more inference time compute, the answers are better than if you just apply, you know,
you know, X, you know, if you apply 10X, you can get better answers than X amount of computed inference time.
And that seems useful and important.
But I think what we would like is when you apply 10X to get, you know, even a bigger improvement in the quality of the answers than we're getting today.
And so that's about, you know, designing new algorithms, trying new approaches, you know, figuring out how best to spend that 10X instead of X to improve things.
I mean, I think search is – I really like Rich Sutton's paper that he wrote about the bitter lesson.
And the bitter lesson effectively is this nice one-page paper.
But the essence of it is you can try lots of approaches.
But the two techniques that are incredibly effective are learning and search.
And you can apply and scale those algorithmic or computationally, and you often will then get better results than any other kind of approach you can apply to a pretty broad variety of problems.
And so I think search has got to be part of the solution to spending more inference time as you want to maybe explore a few different ways of solving this problem.
And like, oh, that one didn't work, but this one worked better.
So now I'm going to explore that a bit more.
I mean, I think one general trend is it's clear that inference time compute, you have a model that's pretty much already trained and you want to do inference on it, is going to be a growing and important class of computation that maybe you want to specialize hardware more around that.
You know, actually, the first TPU was specialized for inference and wasn't really designed for training.