Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
The parameter function map is precisely the same object responsible for the mystery discussed in the representation section.
This is not an airtight argument.
It depends on the empirical question of whether one can ignore or treat as second-order effects other optimization details besides the loss function and whether the hand-wave-y argument for the importance of the parameter function map over the function space loss is solid.
Even if one assumes this argument is valid, we have merely located the mystery, not resolved it.
The question remains, what properties of the parameter function map make targets learnable?
At this point the reasoning becomes more speculative, but I will sketch some ideas.
The representation section concerned what structure the map encodes at each point in parameter space.
Learnability appears to depend on something further.
the structure of paths between points.
Convexity of function space loss implies that paths which are sufficiently straight in function space are barrier-free, roughly, if the endpoint is lower loss, the entire path is downhill.
So the question becomes, which function space paths does the map provide?
the same architectures successfully learn many diverse real-world targets.
Whatever property of the map enables this, it must be relatively universal, not tailored to specific targets.
This naturally leads us to ask, in what cases does the parameter function map provide direct enough paths to targets with certain structure and characterizing what direct enough means?
This connects back to the representation problem.
If the map encodes some notion of program structure, then path structure in parameter space induces relationships between programs, which programs are adjacent, which are reachable from which.
The representation section asks how programs are encoded as points.
Learnability asks how they are connected as paths.
These are different aspects of the same object.
One hypothesis.