Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Meanwhile, even the most efficient versions of Solomon-off induction like speed induction run in exponential time or worse.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

If deep learning is efficiently performing some version of program synthesis analogous to Solomonov induction, that means it has implicitly managed to do what we could not figure out how to do explicitly, its efficiency must be due to some insight which we do not yet know.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Of course, we know part of the answer.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

SGD only needs local information in order to optimize, instead of brute force global search as one does with Bayesian learning.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But then the mystery becomes a well-known one.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Why does myopic search like SGD converge to globally good solutions?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Both of these are questions about the optimization process.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It is not obvious at all how local optimizers like SGD would be able to perform something like Solomonoff induction, let alone far more efficiently than we historically ever figured out for versions of Solomonoff induction itself.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This is a difficult question, but I will attempt to point towards research which I believe can answer these questions.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The optimization process can depend on many things, a priori.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Choice of optimizer, regularization, dropout, step size, etc.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But we can note that deep learning is able to work somewhat successfully, albeit sometimes with degraded performance across wide ranges of choices of these variables.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It does not seem like the choice of Adam W. versus SGD matters nearly as much as the choice to do gradient-based learning in the first place.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In other words, I believe these variables may affect efficiency, but I doubt they are fundamental to the explanation of why the optimization process can possibly succeed.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Instead, there is one common variable here which appears to determine the vast majority of the behavior of stochastic optimizers.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The loss function.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Optimizers like SGD take every gradient step according to a minibatch loss function like mean squared error.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Where?