Zach Furman

"Deep learning as program synthesis" by Zach Furman

If this search is guided by a strong simplicity bias, the unreasonable effectiveness of scaling becomes an expected outcome, rather than a paradox.

1945.718 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

We will now turn to the well-known paradoxes of approximation, generalization, and convergence and see how the program's synthesis hypothesis accounts for each.

1954.811 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Subheading The Paradox of Approximation See also this post for related discussion.

1965.025 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There's an image here.

1972.275 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Before we even consider how a network learns or generalizes, there is a more basic question.

1986.541 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

How can a neural network, with a practical number of parameters, even in principle represent the complex function it is trained on?

1992.167 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Consider the task of image classification.

1999.82 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

A function that takes a 1024 multiplied by 1024 pixel image, roughly 1 million input dimensions, and maps it to a single label like cat or dog viz, a priori, an object of staggering high dimensional complexity.

2003.065 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Who is to say that a good approximation of this function even exists within the space of functions that a neural network of a given size can express?

2017.587 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The textbook answer to this question is the Universal Approximation Theorem, UAT.

2025.689 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This theorem states that a neural network with a single hidden layer can, given enough neurons, approximate any continuous function to arbitrary accuracy.

2031.482 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

On its face, this seems to resolve the issue entirely.

2040.363 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

There's a details box here with the title a precise statement of the universal approximation theorem.

2043.931 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The box contents are omitted from this narration.

2050.18 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

However, this answer is deeply misleading.

2053.625 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The crucial caveat is the phrase given enough neurons.

2056.99 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

A closer look at the proofs of the UAT reveals that for an arbitrary function, the number of neurons required scales exponentially with the dimension of the input.

2060.876 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This is the infamous curse of dimensionality.

2069.918 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

To represent a function on a 1-megapixel image, this would require a catastrophically large number of neurons, more than there are atoms in the universe.

2073.101 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The UAT, then, is not a satisfying explanation.

2082.031 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment