Zach Furman

"Deep learning as program synthesis" by Zach Furman

The parameter function map is precisely the same object responsible for the mystery discussed in the representation section.

3810.085 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This is not an airtight argument.

3817.015 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

It depends on the empirical question of whether one can ignore or treat as second-order effects other optimization details besides the loss function and whether the hand-wave-y argument for the importance of the parameter function map over the function space loss is solid.

3818.858 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Even if one assumes this argument is valid, we have merely located the mystery, not resolved it.

3833.879 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The question remains, what properties of the parameter function map make targets learnable?

3840.048 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

At this point the reasoning becomes more speculative, but I will sketch some ideas.

3846.297 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The representation section concerned what structure the map encodes at each point in parameter space.

3851.405 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Learnability appears to depend on something further.

3857.534 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

the structure of paths between points.

3860.862 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Convexity of function space loss implies that paths which are sufficiently straight in function space are barrier-free, roughly, if the endpoint is lower loss, the entire path is downhill.

3863.868 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

So the question becomes, which function space paths does the map provide?

3875.309 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

the same architectures successfully learn many diverse real-world targets.

3880.657 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Whatever property of the map enables this, it must be relatively universal, not tailored to specific targets.

3885.644 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This naturally leads us to ask, in what cases does the parameter function map provide direct enough paths to targets with certain structure and characterizing what direct enough means?

3892.593 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

This connects back to the representation problem.

3903.868 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

If the map encodes some notion of program structure, then path structure in parameter space induces relationships between programs, which programs are adjacent, which are reachable from which.

3907.066 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

The representation section asks how programs are encoded as points.

3918.48 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

Learnability asks how they are connected as paths.

3923.005 View full episode →

LessWrong (Curated & Popular)

"Deep learning as program synthesis" by Zach Furman

These are different aspects of the same object.