Zach Furman

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This goes beyond a minor deviation from theoretical predictions.

2447.986 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It is a direct contradiction of the theory's core prescriptive advice.

2452.212 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This brings us to a second, deeper puzzle, first highlighted by Jong It-al, 2017, The Authors Conduct a Simple Experiment.

2456.537 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

They train a standard vision model on a real dataset, for example, CIFAR-10, and confirm that it generalizes well.

2465.688 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

They then train the exact same model, with the exact same architecture, optimizer, and regularization, on a corrupted version of the dataset where the labels have been completely randomized.

2474.082 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The network is expressive enough that it is able to achieve near-zero training error on the randomized labels, perfectly memorizing the nonsensical data.

2485.499 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

As expected, its performance on a test set is terrible.

2494.412 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It has learned nothing generalizable.

2497.517 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The paradox is this.

2500.461 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Why did the same exact model generalize well on the real data?

2502.444 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Classical theories often tie a model's generalization ability to its capacity for complexity, which is a fixed property of its architecture related to its expressivity.

2506.39 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

But this experiment shows that generalization is not a static property of the model.

2516.557 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It is a dynamic outcome of the interaction between the model, the learning algorithm, and the structure of the data itself.

2521.578 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The very same network that is completely capable of memorizing random noise somehow chooses to find a generalizable solution when trained on data with real structure.

2528.807 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Why?

2538.28 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The program synthesis hypothesis offers a coherent explanation for both of these paradoxes.

2538.58 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

First, why does scaling work?