Zach Furman

"Deep learning as program synthesis" by Zach Furman

It discovers the Fourier-based algorithm for modular addition.

"Deep learning as program synthesis" by Zach Furman

Coincident with the discovery of this algorithmic program, or rather, the removal of the memorization program, which occurs slightly later, test accuracy abruptly jumps to 100%.

2716.855 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The sudden increase in generalization appears to be the direct consequence of the model replacing a complex, overfitting solution with a simpler, algorithmic one.

2727.028 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

In this instance, generalization is achieved through the synthesis of a different, more efficient program.

2738.023 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Subheading.

2744.598 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The paradox of convergence.

2746.061 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

When we ask a neural network to solve a task, we specify what task we'd like it to solve, but not how it should solve the task, the purpose of learning is for it to find strategies on its own.

2748.645 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

We define a loss function and an architecture, creating a space of possible functions, and ask the learning algorithm to find a good one by minimizing the loss.

2759.704 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Given this freedom and the high dimensionality of the search space, one might expect the solutions found by different models, especially those with different architectures or random initializations, to be highly diverse.

2768.965 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Instead, what we observe empirically is a strong tendency towards convergence.

2781.867 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This is most directly visible in the phenomenon of representational alignment.

2787.156 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This alignment is remarkably robust.

2791.844 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It holds across different training runs of the same architecture, showing that the final solution is not a sensitive accident of the random seed.

2795.29 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

More surprisingly, it holds across different architectures.

2803.702 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The internal activations of a transformer and a CNN trained on the same vision task, for example, can often be mapped to one another with a simple linear transformation, suggesting they are learning not just similar input-output behavior, but similar intermediate computational steps.

2807.807 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It even holds in some cases across modalities.

2823.995 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Models like CLIP, trained to associate images with text, learn a shared representation space where the vector for a photograph of a dog is close to the vector for the phrase a photo of a dog, indicating convergence on a common, abstract conceptual structure.

2827.78 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The mystery deepens when we observe parallels to biological systems.

2842.201 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The GABA-like filters that emerge in the early layers of vision networks, for instance, are strikingly similar to the receptive fields of neurons in the V1 area of the primate visual cortex.

2846.759 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It appears that evolution and stochastic gradient descent, two very different optimization processes operating on very different substrates, have converged on similar solutions when exposed to the same statistical structure of the natural world.

2857.774 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment