Zach Furman

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

For wrong answers, they point in different directions and cancel.

803.899 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This isn't a loose interpretive gloss.

808.006 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Each piece, the circular embedding, the trig identities, the interference pattern, is concretely present in the weights and can be verified by ablations.

810.991 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

So here's the picture that emerges.

820.287 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

During the memorization phase, the network solves the task some other way, presumably something like a lookup table distributed across its parameters.

822.992 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It fits the training data, but the solution doesn't extend.

831.8 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Then, over continued training, a different solution forms.

835.886 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This trigonometric algorithm.

840.172 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

As the algorithm assembles, generalization happens.

842.775 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The two are not merely correlated.

846.761 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Tracing the structure in the weights and the performance on held-out data, they move together.

849.345 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

What should we make of this?

854.532 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Here's one reading.

856.655 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The difference between a network that memorizes and a network that generalizes is not just quantitative, but qualitative.

858.226 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The two networks have learned different kinds of things.

865.014 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

One has stored associations.

868.698 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The other has found a method, a mechanistic procedure that happens to work on inputs beyond those it was trained on because it captures something about the structure of the problem.

871.24 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This is a single example and a toy one.

880.591 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

But it raises a question worth taking seriously.

883.935 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

When networks generalize, is it because they've found something like an algorithm?

887.335 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment