Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
Stochastic gradient descent is not a neutral explorer.
The implicit biases of stochastic optimization when navigating a highly overparameterized loss landscape may create powerful channels that funnel the learning process toward a specific kind of simple, compositional solution.
Perhaps all roads do not lead to Rome, but the roads to Rome are the fastest.
The convergence could therefore be a clue about the nature of our learning dynamics themselves that they possess a strong, intrinsic preference for a particular class of solutions.
Viewed together, these observations suggest that the space of effective solutions for real-world tasks is far smaller and more structured than the space of possible models.
The phenomenon of convergence indicates that our models are finding this structure.
The bitter lesson suggests that our learning methods are general enough to do so.
The remaining questions point us toward the precise nature of that structure and the mechanisms by which our learning algorithms are so remarkably good at finding it.
Heading The Path Forward If you followed the argument this far, you might already sense where it becomes difficult to make precise.
The mechanistic interpretability evidence shows that networks can implement compositional algorithms.
The indirect evidence suggests this connects to why they generalize, scale, and converge.
But Connexter is doing a lot of work.
What would it actually mean to say that deep learning is some form of program synthesis?
Trying to answer this carefully leads to two problems.
The claim neural networks learn programs seems to require saying what a program even is in a space of continuous parameters.
It also requires explaining how gradient descent could find such programs efficiently, given what we know about the intractability of program search.
These are the kinds of problems where the difficulty itself is informative.
Each has a specific shape, what you need to think about, what a resolution would need to provide.
I focus on them deliberately.
That shape is what eventually pointed me towards specific mathematical tools I wouldn't have considered otherwise.