Zach Furman

It could solve problems which I had absolutely no idea how to code myself, for example how to distinguish a cat from a dog, and in a completely opaque way such that even after it had solved the problem I had no better picture for how to solve the problem myself than I did beforehand.

1701.773 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Moreover, it was remarkably resilient, despite obvious problems with the optimizer, or bugs in the code, or bad training data, unlike any other engineered system I had ever built, almost reminiscent of something biological in its robustness.

1717.508 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

My impression is that this sense of magic is a common, if often unspoken, experience among practitioners.

1732.165 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Many simply learn to accept the mystery and get on with the work.

1738.954 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

But there is nothing virtuous about confusion, it just suggests that your understanding is incomplete, that you are ignorant of the real mechanisms underlying the phenomenon.

1742.939 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Our practical success with deep learning has outpaced our theoretical understanding.

1752.611 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This has led to a proliferation of explanations that often feel ad hoc and local, tailor-made to account for a specific empirical finding without connecting to other observations or any larger framework.

1757.68 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

For instance, the theory of double descent provides a narrative for the U-shaped test-loss curve, but it is a self-contained story.

1769.085 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It does not, for example, share a conceptual foundation with the theories we have for how induction heads form in transformers.

1776.631 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Each new discovery seems to require a new, bespoke theory.