Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Zach Furman

๐Ÿ‘ค Speaker
696 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The field's response was pragmatic.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Scale the methods that work.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Stop trying to understand why they work.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This attitude was partly earned.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

For decades, hand-engineered systems encoding human knowledge about vision or language had lost to generic architectures trained on data.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Human intuitions about what mattered kept being wrong.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But the pragmatic stance hardened into something stronger, a tacit assumption that trained networks were intrinsically opaque, that asking what the weights meant was a category error.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

At first glance, this assumption seemed to have some theoretical basis.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

If neural networks were best understood as just curve-fitting function approximators, then there was no obvious reason to expect the learned parameters to mean anything in particular.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

They were solutions to an optimization problem, not representations.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

And when researchers did look inside, they found dense matrices of floating point numbers with no obvious organization.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But a lens that predicts opacity makes the same prediction whether structure is absent or merely invisible.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Some researchers kept looking.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

PowerIT OWL 2022, train a small transformer on modular edition.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Given two numbers, output their sum mod 113.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

Only a fraction of the possible input pairs are used for training, say, 30%, with the rest held out for testing.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

The network memorized the training pairs quickly, getting them all correct.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

But on pairs it hasn't seen, it does no better than chance.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

This is unsurprising.

LessWrong (Curated & Popular)
"Deep learning as program synthesis" by Zach Furman

With enough parameters, a network can simply store input output associations without extracting any rule.