Jeff Dean
👤 PersonAppearances Over Time
Podcast Appearances
So there is a huge amount of headroom there to say, OK, if we can make this thing more expensive but smarter, because we're like 100x cheaper than reading a paperback.
We're like 10,000 times cheaper than talking to a customer support agent.
We're like a million times or more cheaper than hiring a software engineer or talking to your doctor or lawyer.
Add computation and make it smarter.
I think a lot of the takeoff that we're going to see in the very near future is of this form.
We've been exploiting and improving pre-training a lot in the past and post-training.
Those things will continue to improve, but taking advantage of
think harder at inference time is going to just be an explosion.
Well, we're working out the algorithms as we speak.
So I believe...
You know, we'll see better and better solutions to this as these many more than 10,000 researchers are hacking at it, at Google.
In general, transformers can use the sequence length as a batch during training, but they can't really in inference because when you're generating one token at a time.
So there may be different hardware and inference algorithms that we design for the purposes of being efficient at inference.
Yeah.
Basically, the big model is being used as a verifier as opposed to a generator and verification you can do.
As long as you can sync all of the parameters of the model across the different data centers and then accumulate all the gradients.
So in the time it takes to do one step, you're pretty good.
In practice it works.
It was so pleasant to go from async to sync because your experiments are now replicable.
replicable like rather than like every like your result depend on like whether there was like a web crawler running on the same machine it's like one of your computers so i am so much happier running on like tpu pods i love asynchrony it just lets you scale to iphones and xbox or whatever yeah what if we could give you asynchronous but replicatable results