Zach Furman

👤 Speaker

696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Subheading.

2314.278 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The paradox of generalization.

2315.64 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

See also this post for related discussion.

2318.483 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There's an image here.

2321.608 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Perhaps the most jarring departure from classical theory comes from how deep learning models generalize.

2336.057 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

A learning algorithm is only useful if it can perform well on new, unseen data.

2341.943 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The central question of statistical learning theory is, what are the conditions that allow a model to generalize?

2347.37 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The classical answer is the bias-variance trade-off.

2354.358 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The theory posits that a model's error can be decomposed into two main sources.

2358.223 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Bias.

2363.695 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Error from the model being too simple to capture the underlying structure of the data, underfitting.

2365.318 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Variance.

2371.469 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Error from the model being too sensitive to the specific training data it saw, causing it to fit noise, overfitting.

2373.132 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

According to this framework, learning is a delicate balancing act.

2380.705 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The practitioner's job is to carefully choose a model of the right complexity, not too simple, not too complex to land in a Goldilocks zone where both bias and variance are low.

2385.212 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This view is reinforced by principles like the no-free-lunch theorems, which suggest there is no universally good learning algorithm, only algorithms whose inductive biases are carefully chosen by a human to match a specific problem domain.

2395.627 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The clear prediction from this classical perspective is that naively increasing a model's capacity, for example, by adding more parameters, far beyond what is needed to fit the training data is a recipe for disaster.

2409.427 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Such a model should have catastrophically high variance, leading to rampant overfitting and poor generalization.

2421.643 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

And yet, perhaps the single most important empirical finding in modern deep learning is that this prediction is completely wrong.

2428.172 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The bitter lesson, as Rich Sutton calls it, is that the most reliable path to better performance is to scale up compute and model size, sometimes far into the regime where the model can easily memorize the entire training set.

2435.33 View full episode →

← Previous Page 21 of 35 Next →

Report any issue