Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

If this search is guided by a strong simplicity bias, the unreasonable effectiveness of scaling becomes an expected outcome, rather than a paradox.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

We will now turn to the well-known paradoxes of approximation, generalization, and convergence and see how the program's synthesis hypothesis accounts for each.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Subheading The Paradox of Approximation See also this post for related discussion.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There's an image here.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Before we even consider how a network learns or generalizes, there is a more basic question.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

How can a neural network, with a practical number of parameters, even in principle represent the complex function it is trained on?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Consider the task of image classification.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

A function that takes a 1024 multiplied by 1024 pixel image, roughly 1 million input dimensions, and maps it to a single label like cat or dog viz, a priori, an object of staggering high dimensional complexity.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Who is to say that a good approximation of this function even exists within the space of functions that a neural network of a given size can express?

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The textbook answer to this question is the Universal Approximation Theorem, UAT.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This theorem states that a neural network with a single hidden layer can, given enough neurons, approximate any continuous function to arbitrary accuracy.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

On its face, this seems to resolve the issue entirely.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

There's a details box here with the title a precise statement of the universal approximation theorem.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The box contents are omitted from this narration.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

However, this answer is deeply misleading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The crucial caveat is the phrase given enough neurons.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

A closer look at the proofs of the UAT reveals that for an arbitrary function, the number of neurons required scales exponentially with the dimension of the input.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This is the infamous curse of dimensionality.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

To represent a function on a 1-megapixel image, this would require a catastrophically large number of neurons, more than there are atoms in the universe.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The UAT, then, is not a satisfying explanation.