Noam Shazeer

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You know, like...

6153.132 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

This one's super good at dates.

6153.753 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

looking at the example and... I mean, one thing I would say is, like, there is a bunch of work on interpretability of models and what are they doing inside.

6173.429 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And sort of expert-level interpretability is a sub-problem of that broader area.

6182.033 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I really like some of the work that my former intern, Chris Ola, and others did at Anthropic where they could kind of, they trained a very sparse autoencoder and were able to deduce, you know, what characteristics does some particular neuron in a large language.

6188.431 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So they found like a Golden Gate Bridge neuron that's activated when you're talking about the Golden Gate Bridge.

6202.971 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And I think, you know, you could do that at the expert level.

6208.718 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You could do that at a variety of different levels and get pretty interpretable results.

6211.862 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

results and it's a little unclear if you necessarily need that.

6217.49 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

If the model is just really good at stuff, you know, we don't necessarily care what every neuron in the Gemini model is doing as long as the collective output and characteristics of the overall system are good.

6220.676 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You know, that's one of the beauties of deep learning is you don't need to understand or hand engineer every last feature.

6234.881 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Right, but you still have a smaller batch at each expert that then goes through.

6356.708 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And in order to get kind of reasonable balance, like one of the things that the current models typically do is they have all the experts be roughly the same compute cost.

6361.074 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then you run roughly the same size batches through them in order to sort of propagate the very large batch you're doing at inference time and have good efficiency.

6371.149 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But I think...

6381.644 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You know, you often in the future might want experts that vary in computational costs by factors of 100 or 1,000.

6382.645 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Yeah.

6390.724 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Or maybe paths that go for many layers on one case and –

6391.806 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

you know, a single layer or even a skip connection in the other case.

6396.938 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And there, I think you're going to want very large batches still, but you're going to want to kind of push things through the model a little bit asynchronously at inference time, which is a little easier than training time.

6401.824 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment