Noam Shazeer
👤 PersonAppearances Over Time
Podcast Appearances
And that would be pretty cool.
I do think that that fabrication time is, if that's in your inner loop of improvement, you're going to like,
How long is it?
The leading edge nodes, unfortunately, are taking longer and longer because they have more metal layers than previous older nodes.
So that tends to make it take anywhere from three to five months.
That can run on, like, existing chips and explore lots of cool ideas.
I mean, I like to think of it like this, right?
Like right now we have models that can take a pretty complicated problem and can break it down, you know, internally in the model into a bunch of steps, can sort of puzzle together the solutions for those steps and can often give you a solution to the entire problem that you're asking.
But it, you know,
isn't super reliable and it's good at breaking things down into, you know, 5 to 10 steps, not 100 to 1,000 steps.
So if you could go from, yeah, 80% of the time it can give you a perfect answer to something that's 10 steps long,
to something that 90% of the time can give you a perfect answer to something that's 100 to 1,000 steps of sub-problem long, that would be an amazing improvement in capability of these models.
And we're not there yet, but I think that's what we're aspirationally trying to get to is.
Never looked new hardware in the mouth.
Yeah, and an aspect of inference time is I think you want the system to be actively exploring a bunch of different potential solutions.
Maybe it does some searches on its own and gets some information back and consumes that information and figures out, oh, now I would really like to know more about this thing.
So now it kind of iteratively kind of explores how to best solve the high-level problems
problem you pose to this system.
And I think having a dial where you can make the model give you better answers with more inference time compute seems like we have a bunch of techniques now that seem like they can kind of do that.
And the more you crank up the dial, the more it costs you in terms of compute, but the better the answers get.