Zach Furman

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The learning process is not looking for any program that fits the data.

2572.658 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It is looking for the simplest program.

2576.762 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Giving the search more resources, parameters, compute, data, provides a better opportunity to find the simple, generalizable program that corresponds to the true underlying structure, rather than settling for a more complex, memorizing one.

2579.505 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Second, why does generalization depend on the data's structure?

2594.039 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This is a natural consequence of a simplicity-biased program search.

2598.292 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

When trained on real data, there exists a short, simple program that explains the statistical regularities, for example, cats have pointy ears and whiskers.

2603.439 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The simplicity bias of the learning process finds this program, and because it captures the true structure, it generalizes well.

2613.492 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

When trained on random labels, no such simple program exists.

2621.402 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The only way to map the given images to the random labels is via a long, complicated, high-complexity program, effectively a look-up table.

2625.832 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Forced against its inductive bias, the learning algorithm eventually finds such a program to minimize the training loss.

2634.943 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This solution is pure memorization and, naturally, fails to generalize.

2642.151 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

If one assumes something like the program synthesis hypothesis is true, the phenomenon of data-dependent generalization is not so surprising.

2647.557 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

A model's ability to generalize is not a fixed property of its architecture, but a property of the program it learns.

2655.869 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The model finds a simple program on the real dataset and a complex one on the random dataset, and the two programs have very different generalization properties and there is some evidence that the mechanism behind generalization is not so unrelated to the other empirical phenomena we have discussed.

2662.836 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

We can see this in the grokking setting discussed earlier.

2678.81 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Recall the transformer trained on modular addition.

2682.334 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Initially, the model learns a memorization-based program.

2686.126 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It achieves 100% accuracy on the training data, but its test accuracy is near zero.

2690.351 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This is analogous to learning the random label dataset, a complex, non-generalizing solution.

2696.779 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

After extensive further training, driven by a regularizer that penalizes complexity, weight decay, the model's internal solution undergoes a phase transition.

2703.067 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment