Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
If this search is guided by a strong simplicity bias, the unreasonable effectiveness of scaling becomes an expected outcome, rather than a paradox.
We will now turn to the well-known paradoxes of approximation, generalization, and convergence and see how the program's synthesis hypothesis accounts for each.
Subheading The Paradox of Approximation See also this post for related discussion.
There's an image here.
Before we even consider how a network learns or generalizes, there is a more basic question.
How can a neural network, with a practical number of parameters, even in principle represent the complex function it is trained on?
Consider the task of image classification.
A function that takes a 1024 multiplied by 1024 pixel image, roughly 1 million input dimensions, and maps it to a single label like cat or dog viz, a priori, an object of staggering high dimensional complexity.
Who is to say that a good approximation of this function even exists within the space of functions that a neural network of a given size can express?
The textbook answer to this question is the Universal Approximation Theorem, UAT.
This theorem states that a neural network with a single hidden layer can, given enough neurons, approximate any continuous function to arbitrary accuracy.
On its face, this seems to resolve the issue entirely.
There's a details box here with the title a precise statement of the universal approximation theorem.
The box contents are omitted from this narration.
However, this answer is deeply misleading.
The crucial caveat is the phrase given enough neurons.
A closer look at the proofs of the UAT reveals that for an arbitrary function, the number of neurons required scales exponentially with the dimension of the input.
This is the infamous curse of dimensionality.
To represent a function on a 1-megapixel image, this would require a catastrophically large number of neurons, more than there are atoms in the universe.
The UAT, then, is not a satisfying explanation.