Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Subheading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The paradox of generalization.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

See also this post for related discussion.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There's an image here.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Perhaps the most jarring departure from classical theory comes from how deep learning models generalize.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

A learning algorithm is only useful if it can perform well on new, unseen data.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The central question of statistical learning theory is, what are the conditions that allow a model to generalize?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The classical answer is the bias-variance trade-off.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The theory posits that a model's error can be decomposed into two main sources.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Bias.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Error from the model being too simple to capture the underlying structure of the data, underfitting.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Variance.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Error from the model being too sensitive to the specific training data it saw, causing it to fit noise, overfitting.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

According to this framework, learning is a delicate balancing act.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The practitioner's job is to carefully choose a model of the right complexity, not too simple, not too complex to land in a Goldilocks zone where both bias and variance are low.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This view is reinforced by principles like the no-free-lunch theorems, which suggest there is no universally good learning algorithm, only algorithms whose inductive biases are carefully chosen by a human to match a specific problem domain.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The clear prediction from this classical perspective is that naively increasing a model's capacity, for example, by adding more parameters, far beyond what is needed to fit the training data is a recipe for disaster.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Such a model should have catastrophically high variance, leading to rampant overfitting and poor generalization.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

And yet, perhaps the single most important empirical finding in modern deep learning is that this prediction is completely wrong.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The bitter lesson, as Rich Sutton calls it, is that the most reliable path to better performance is to scale up compute and model size, sometimes far into the regime where the model can easily memorize the entire training set.