Zach Furman

The overall loss function can be written as a composition of the parameter function map and some statistical distance to the target distribution, for example for mean squared error.

3736.072 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

3745.898 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Where?

3748.385 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Complex formula omitted from the narration.

3749.006 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Note that the statistical distance, complex formula omitted from the narration, here is a fairly simple object.

3751.36 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Almost always the statistical distance here is on function space, convex and with relatively simple functional form.

3758.748 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Further, it is the same distance one would use across many different architectures, including ones which do not achieve the remarkable performance of neural networks, for example polynomial approximation.

3765.996 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Therefore one expects the question of learnability and inductive biases to largely come down to the parameter function map f subscript w rather than the function space loss function .

3777.348 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

If the above reasoning is correct, that means that in order to understand how SGD is able to potentially perform some kind of program synthesis, we merely need to understand properties of the parameter function map.

3788.324 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This would be a substantial simplification,

3802.184 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Further, this relates learning dynamics to our earlier representation problem.

3805.158 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment