Zach Furman

"Deep learning as program synthesis" by Zach Furman

The first is the phenomenon of degeneracies.

"Deep learning as program synthesis" by Zach Furman

Consider, for instance, dead neurons, whose incoming weights and activations are such that the neurons never fires for any input.

3438.245 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

A neural network with dead neurons acts like a smaller network with those dead neurons removed.

3446.336 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This gives a mechanism for neural networks to change their effective size in a parameter-dependent way, which is required in order to for example dynamically add or remove a new subroutine depending on where you are in parameter space, as in our example above.

3452.203 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

In fact dead neurons are just one example in a whole zoo of degeneracies with similar effects which seem incredibly pervasive in neural networks.

3466.342 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It is worth mentioning that the present picture is now highly suggestive of a specific branch of math known as algebraic geometry.

3475.005 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Algebraic geometry, in particular, singularity theory, systematically studies these degeneracies and further provides a bridge between discrete structure, algebra, and continuous structure, geometry, exactly the type of connection we identified as necessary for the program synthesis hypothesis.

3482.474 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Furthermore, singular learning theory tells us how these degeneracies control the lost landscape and the learning process, classically, only in the Bayesian setting, a limitation we discuss in the next section.

3500.881 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There is much more that can be said here, but I leave it for the future to treat this material properly.

3512.782 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Subheading.

3518.851 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The search problem.

3520.213 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There's another problem with this story.

3522.076 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Our hypothesis is that deep learning is performing some version of program synthesis.

3524.62 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

That means that we not only have to explain how programs get represented in neural networks, we also need to explain how they get learned.

3529.868 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There are two subproblems here.

3538.101 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

First, how can deep learning even implement the needed inductive biases?

3541.013 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

For deep learning algorithms to be implementing something analogous to Solomonoff induction, they must be able to implicitly follow inductive biases which depend on the program structure, like simplicity bias.

3545.938 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

That is, the optimization process must somehow be aware of the program structure in order to favor some types of programs, for example shorter programs, over others.

3557.732 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

the optimizer must see the program structure of parameters.

3567.76 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Second, deep learning works in practice, using a reasonable amount of computational resources.

3572.63 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment