Zach Furman

"Deep learning as program synthesis" by Zach Furman

Is it best explained by the implicit bias of SGD towards flat minima, the behavior of neural tangent kernels, or some other property?

1807.965 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The field actively debates these views.

1816.516 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

And where no mechanistic theory has gained traction, we often retreat to descriptive labels.

1819.64 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

We say complex abilities are an emergent property of scale, a term that names the mystery without explaining its cause.

1825.388 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This theoretical disarray is sharpest when we examine our most foundational frameworks.

1832.437 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Here, the issue is not just a lack of consensus but a direct conflict with empirical reality.

1838.249 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This disconnect manifests in several ways.

1844.241 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Sometimes, our theories make predictions that are actively falsified by practice.

1848.21 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Classical statistical learning theory, with its focus on the bias-variance trade-off, advises against the very scaling strategies that have produced almost all state-of-the-art performance.

1853.682 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

In other cases, a theory might be technically true but practically misleading, failing to explain the key properties that make our models effective.

1864.876 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The universal approximation theorem, for example, guarantees representational power but does so via a construction that implies an exponential scaling that our models somehow avoid.

1873.266 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

And in yet other areas, our classical theories are almost entirely silent.

1884.513 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

They offer no framework to even begin explaining deep puzzles like the uncanny convergence of representations across vastly different models trained on the same data.

1889.818 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

We are therefore faced with a collection of major empirical findings where our foundational theories are either contradicted, misleading, or simply absent.

1898.927 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This theoretical vacuum creates an opportunity for a new perspective.

1907.876 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The program synthesis hypothesis offers such a perspective.

1912.289 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It suggests we shift our view of what deep learning is fundamentally doing from statistical function fitting to program search.

1916.407 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The specific claim is that deep learning performs a search for simple programs that explain the data.

1924.697 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This shift in viewpoint may offer a way to make sense of the theoretical tensions we have outlined.

1930.565 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

If the learning process is a search for an efficient program rather than an arbitrary function, then the circumvention of the curse of dimensionality is no longer so mysterious.

1936.272 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment