Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
In fact, it's a mathematical restatement of a near-trivial fact.
With exponential resources, one can simply memorize a function's behavior.
The constructions used to prove the theorem are effectively building a continuous version of a look-up table.
This is not an explanation for the success of deep learning.
It is a proof that if deep learning had to deal with arbitrary functions, it would be hopelessly impractical.
This is not merely a weakness of the UAT's particular proof.
It is a fundamental property of high-dimensional spaces.
Classical results in approximation theory show that this exponential scaling is not just an upper bound on what's needed, but a strict lower bound.
These theorems prove that any method that aims to approximate arbitrary smooth functions is doomed to suffer the curse of dimensionality.
There's a details box here with the title the parameter count lower bound.
The box contents are omitted from this narration.
The real lesson of the universal approximation theorem, then, is not that neural networks are powerful.
The real lesson is that if the functions we learn in the real world were arbitrary, deep learning would be impossible.
The empirical success of deep learning with a reasonable number of parameters is therefore a profound clue about the nature of the problems themselves.
They must have structure.
The program synthesis hypothesis gives a name to this structure.
Compositionality.
This is not a new idea.
It is the foundational principle of computer science.
To solve a complex problem, we do not write down a giant lookup table that specifies the output for every possible input.