Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It discovers the Fourier-based algorithm for modular addition.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Coincident with the discovery of this algorithmic program, or rather, the removal of the memorization program, which occurs slightly later, test accuracy abruptly jumps to 100%.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The sudden increase in generalization appears to be the direct consequence of the model replacing a complex, overfitting solution with a simpler, algorithmic one.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

In this instance, generalization is achieved through the synthesis of a different, more efficient program.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Subheading.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The paradox of convergence.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

When we ask a neural network to solve a task, we specify what task we'd like it to solve, but not how it should solve the task, the purpose of learning is for it to find strategies on its own.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

We define a loss function and an architecture, creating a space of possible functions, and ask the learning algorithm to find a good one by minimizing the loss.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Given this freedom and the high dimensionality of the search space, one might expect the solutions found by different models, especially those with different architectures or random initializations, to be highly diverse.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Instead, what we observe empirically is a strong tendency towards convergence.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This is most directly visible in the phenomenon of representational alignment.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This alignment is remarkably robust.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It holds across different training runs of the same architecture, showing that the final solution is not a sensitive accident of the random seed.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

More surprisingly, it holds across different architectures.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The internal activations of a transformer and a CNN trained on the same vision task, for example, can often be mapped to one another with a simple linear transformation, suggesting they are learning not just similar input-output behavior, but similar intermediate computational steps.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It even holds in some cases across modalities.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Models like CLIP, trained to associate images with text, learn a shared representation space where the vector for a photograph of a dog is close to the vector for the phrase a photo of a dog, indicating convergence on a common, abstract conceptual structure.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The mystery deepens when we observe parallels to biological systems.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The GABA-like filters that emerge in the early layers of vision networks, for instance, are strikingly similar to the receptive fields of neurons in the V1 area of the primate visual cortex.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

It appears that evolution and stochastic gradient descent, two very different optimization processes operating on very different substrates, have converged on similar solutions when exposed to the same statistical structure of the natural world.