Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

From a theoretical computer science perspective, this is what algorithms look like, in general.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Not just the specific trigonometric trick from grokking, but computation as such.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

You take a hard problem, break it into pieces, solve the pieces, and combine the results.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

What makes this tractable, what makes it an algorithm rather than a lookup table, is precisely the compositional structure.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The reuse is what makes it compact.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The compactness is what makes it feasible.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There's an image here.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Groking and Inception V1 are two examples, but they are far from the only ones.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Mechanistic interpretability has grown into a substantial field and the researchers working in it have documented many such structures in toy models, in language models, across different architectures and tasks.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Induction heads, language circuits, and bracket matching in transformer language models, learned world models and multi-step reasoning in toy tasks, grid cell-like mechanisms in RL agents, hierarchical representations in GANs, and much more.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Where we manage to look carefully, we tend to find something mechanistic.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This raises a question.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

If what we find inside trained networks, at least when we can find anything, looks like algorithms built from parts, what does that suggest about what deep learning is doing?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Heading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The hypothesis.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

What should we make of this?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

We have seen neural networks learn solutions that look like algorithms, compositional structures built from simple, reusable parts.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In the grokking case, this coincided precisely with generalization.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In Inception V1, this structure is what lets the network recognize objects despite the vast dimensionality of the input space.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

And across many other cases documented in the mechanistic interpretability literature, the same shape appears.