Zach Furman

👤 Speaker

696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

From a theoretical computer science perspective, this is what algorithms look like, in general.

1245.744 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Not just the specific trigonometric trick from grokking, but computation as such.

1251.633 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

You take a hard problem, break it into pieces, solve the pieces, and combine the results.

1256.96 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

What makes this tractable, what makes it an algorithm rather than a lookup table, is precisely the compositional structure.

1262.828 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The reuse is what makes it compact.

1270.158 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The compactness is what makes it feasible.

1272.962 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There's an image here.

1275.986 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Groking and Inception V1 are two examples, but they are far from the only ones.

1307.928 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Mechanistic interpretability has grown into a substantial field and the researchers working in it have documented many such structures in toy models, in language models, across different architectures and tasks.

1313.676 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Induction heads, language circuits, and bracket matching in transformer language models, learned world models and multi-step reasoning in toy tasks, grid cell-like mechanisms in RL agents, hierarchical representations in GANs, and much more.

1326.254 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Where we manage to look carefully, we tend to find something mechanistic.

1341.278 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This raises a question.

1345.785 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

If what we find inside trained networks, at least when we can find anything, looks like algorithms built from parts, what does that suggest about what deep learning is doing?

1347.854 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Heading.

1358.145 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The hypothesis.

1359.406 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

What should we make of this?

1361.468 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

We have seen neural networks learn solutions that look like algorithms, compositional structures built from simple, reusable parts.

1363.591 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

In the grokking case, this coincided precisely with generalization.

1371.799 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

In Inception V1, this structure is what lets the network recognize objects despite the vast dimensionality of the input space.

1376.558 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

And across many other cases documented in the mechanistic interpretability literature, the same shape appears.

1384.533 View full episode →

← Previous Page 12 of 35 Next →

Report any issue