Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The first is the phenomenon of degeneracies.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Consider, for instance, dead neurons, whose incoming weights and activations are such that the neurons never fires for any input.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

A neural network with dead neurons acts like a smaller network with those dead neurons removed.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This gives a mechanism for neural networks to change their effective size in a parameter-dependent way, which is required in order to for example dynamically add or remove a new subroutine depending on where you are in parameter space, as in our example above.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In fact dead neurons are just one example in a whole zoo of degeneracies with similar effects which seem incredibly pervasive in neural networks.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It is worth mentioning that the present picture is now highly suggestive of a specific branch of math known as algebraic geometry.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Algebraic geometry, in particular, singularity theory, systematically studies these degeneracies and further provides a bridge between discrete structure, algebra, and continuous structure, geometry, exactly the type of connection we identified as necessary for the program synthesis hypothesis.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Furthermore, singular learning theory tells us how these degeneracies control the lost landscape and the learning process, classically, only in the Bayesian setting, a limitation we discuss in the next section.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There is much more that can be said here, but I leave it for the future to treat this material properly.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Subheading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The search problem.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There's another problem with this story.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Our hypothesis is that deep learning is performing some version of program synthesis.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

That means that we not only have to explain how programs get represented in neural networks, we also need to explain how they get learned.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There are two subproblems here.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

First, how can deep learning even implement the needed inductive biases?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

For deep learning algorithms to be implementing something analogous to Solomonoff induction, they must be able to implicitly follow inductive biases which depend on the program structure, like simplicity bias.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

That is, the optimization process must somehow be aware of the program structure in order to favor some types of programs, for example shorter programs, over others.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

the optimizer must see the program structure of parameters.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Second, deep learning works in practice, using a reasonable amount of computational resources.