Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

For wrong answers, they point in different directions and cancel.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This isn't a loose interpretive gloss.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Each piece, the circular embedding, the trig identities, the interference pattern, is concretely present in the weights and can be verified by ablations.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

So here's the picture that emerges.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

During the memorization phase, the network solves the task some other way, presumably something like a lookup table distributed across its parameters.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It fits the training data, but the solution doesn't extend.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Then, over continued training, a different solution forms.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This trigonometric algorithm.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

As the algorithm assembles, generalization happens.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The two are not merely correlated.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Tracing the structure in the weights and the performance on held-out data, they move together.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

What should we make of this?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Here's one reading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The difference between a network that memorizes and a network that generalizes is not just quantitative, but qualitative.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The two networks have learned different kinds of things.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

One has stored associations.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The other has found a method, a mechanistic procedure that happens to work on inputs beyond those it was trained on because it captures something about the structure of the problem.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This is a single example and a toy one.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But it raises a question worth taking seriously.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

When networks generalize, is it because they've found something like an algorithm?