Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Is the parameter vector, F subscript W is the input, output map of the model on parameter.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Other, complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Training examples and labels, and tau is the learning rate.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In the most common versions of supervised learning, we can focus even further.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The loss function itself can be decomposed into two effects.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The parameter function map, complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

And the target distribution.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The overall loss function can be written as a composition of the parameter function map and some statistical distance to the target distribution, for example for mean squared error.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Where?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Note that the statistical distance, complex formula omitted from the narration, here is a fairly simple object.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Almost always the statistical distance here is on function space, convex and with relatively simple functional form.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Further, it is the same distance one would use across many different architectures, including ones which do not achieve the remarkable performance of neural networks, for example polynomial approximation.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Therefore one expects the question of learnability and inductive biases to largely come down to the parameter function map f subscript w rather than the function space loss function .

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

If the above reasoning is correct, that means that in order to understand how SGD is able to potentially perform some kind of program synthesis, we merely need to understand properties of the parameter function map.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This would be a substantial simplification,

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Further, this relates learning dynamics to our earlier representation problem.