Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

And if so, what does that tell us about what deep learning is actually doing?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It's worth noting what was and wasn't in the training data.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The data contained input-output pairs 32 and 41 give 73, and so on.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It contained nothing about how to compute them.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The network arrived at a method on its own.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

And both solutions, the lookup table and the trigonometric algorithm, fit the training data equally well.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The network's loss was already near minimal during the memorization phase.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Whatever caused it to keep searching, to eventually settle on the generalizing algorithm instead, it wasn't that the generalizing algorithm fit the data better.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It was something else, some property of the learning process that favored one kind of solution over another.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The generalizing algorithm is, in a sense, simpler.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It compresses what would otherwise be thousands of stored associations into a compact procedure.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Whether that's the right way to think about what happened here, whether simplicity is really what the training process favors, is not obvious.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But something made the network prefer a mechanistic solution that generalized over one that didn't, and it wasn't the training data alone.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Subheading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Vision circuits.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Grokking is a controlled setting, a small network, a simple task, designed to be fully interpretable.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Does the same kind of structure appear in realistic models solving realistic problems?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Ola Ital, 2020, study Inception V1, an image classification network trained on ImageNet, a dataset of over a million photographs labeled with object categories.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The network takes in an image and outputs a probability distribution over a thousand possible labels, car, dog, coffee mug, and so on.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Can we understand this more realistic setting?