Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Jeff Dean

👤 Person
374 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So there is a huge amount of headroom there to say, OK, if we can make this thing more expensive but smarter, because we're like 100x cheaper than reading a paperback.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

We're like 10,000 times cheaper than talking to a customer support agent.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

We're like a million times or more cheaper than hiring a software engineer or talking to your doctor or lawyer.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Add computation and make it smarter.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I think a lot of the takeoff that we're going to see in the very near future is of this form.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

We've been exploiting and improving pre-training a lot in the past and post-training.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Those things will continue to improve, but taking advantage of

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

think harder at inference time is going to just be an explosion.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Well, we're working out the algorithms as we speak.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So I believe...

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You know, we'll see better and better solutions to this as these many more than 10,000 researchers are hacking at it, at Google.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

In general, transformers can use the sequence length as a batch during training, but they can't really in inference because when you're generating one token at a time.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So there may be different hardware and inference algorithms that we design for the purposes of being efficient at inference.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Yeah.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Basically, the big model is being used as a verifier as opposed to a generator and verification you can do.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

As long as you can sync all of the parameters of the model across the different data centers and then accumulate all the gradients.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So in the time it takes to do one step, you're pretty good.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

In practice it works.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

It was so pleasant to go from async to sync because your experiments are now replicable.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

replicable like rather than like every like your result depend on like whether there was like a web crawler running on the same machine it's like one of your computers so i am so much happier running on like tpu pods i love asynchrony it just lets you scale to iphones and xbox or whatever yeah what if we could give you asynchronous but replicatable results