Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This goes beyond a minor deviation from theoretical predictions.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It is a direct contradiction of the theory's core prescriptive advice.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This brings us to a second, deeper puzzle, first highlighted by Jong It-al, 2017, The Authors Conduct a Simple Experiment.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

They train a standard vision model on a real dataset, for example, CIFAR-10, and confirm that it generalizes well.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

They then train the exact same model, with the exact same architecture, optimizer, and regularization, on a corrupted version of the dataset where the labels have been completely randomized.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The network is expressive enough that it is able to achieve near-zero training error on the randomized labels, perfectly memorizing the nonsensical data.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

As expected, its performance on a test set is terrible.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It has learned nothing generalizable.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The paradox is this.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Why did the same exact model generalize well on the real data?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Classical theories often tie a model's generalization ability to its capacity for complexity, which is a fixed property of its architecture related to its expressivity.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But this experiment shows that generalization is not a static property of the model.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It is a dynamic outcome of the interaction between the model, the learning algorithm, and the structure of the data itself.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The very same network that is completely capable of memorizing random noise somehow chooses to find a generalizable solution when trained on data with real structure.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Why?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The program synthesis hypothesis offers a coherent explanation for both of these paradoxes.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

First, why does scaling work?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The hypothesis posits that learning is a search through some space of programs, guided by a strong simplicity bias.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In this view, adding more parameters is analogous to expanding the search space, for example, allowing for longer or more complex programs.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

While this does increase the model's capacity to represent overfitting solutions, the simplicity bias acts as a powerful regularizer.