Noam Shazeer
๐ค SpeakerAppearances Over Time
Podcast Appearances
And then subsequent TPUs were really designed more around training and also for inference.
But it may be that, you know, when you have something where you really want to crank up the amount of compute you use at inference time, that even more specialized solutions won't make a lot of sense.
Training or inference?
Oh yeah.
I mean, I think I like to think of it as, is the inference that you're trying to do latency sensitive, like a user's actively waiting for it, or is it kind of a background thing?
And maybe that's,
I have some inference tasks that I'm trying to run over a whole batch of data, but it's not for a particular user.
It's just I want to run inference on it and extract some information.
And then there's probably a bunch of things that we don't really have very much of right now, but you're seeing inklings of it in our deep research tool that we just released.
I forget exactly when, like a week ago.
where you can give it a pretty complicated high-level task.
Like, hey, can you go off and research the history of renewable energy and all the trends and costs for wind and solar and other kinds of techniques and put it in a table and give me a full eight-page report?
And it will come back with an eight-page report with like 50 entries in the bibliography.
It's pretty remarkable.
But you're not actively waiting for that for one second.
It takes like, you know, a minute or two to go do that.
And I think there's going to be a fair bit of that kind of compute.
And that's the kind of thing where you have some UI questions around, okay, if you're going to have a user with 20 of these kind of asynchronous tasks in the background happening, and maybe each one of them needs to get more information from the user.
Like, I found your flights to Berlin, but there's no nonstop ones.
Are you okay with a nonstop one?