Francois Chollet
👤 SpeakerAppearances Over Time
Podcast Appearances
And if you want to understand that, you can sort of like...
compare it, contrast it with deep learning.
So in deep learning, your model is a parametric, a differentiable parametric curve.
In program synthesis, your model is a discrete graph of operators.
So you've got like a set of logical operators, like a domain-specific language.
You're picking instances of it.
You're structuring that into a graph that's a program.
And that's actually very similar to like a program you might write in Python or C++ and so on.
And in deploying your learning engine, because we are doing machine learning here, like we're trying to automatically learn these models, in deploying your learning engine is gradient descent, right?
And gradient descent is very compute efficient because you have this very strong informative feedback signal, right?
about where the solution is so you can get to the solution very quickly.
But it is very data inefficient, meaning that in order to make it work, you need a dense sampling of the operating space.
You need a dense sampling of the data distribution.
And then you're limited to only generalizing within that data distribution.
And the reason why you have this limitation is because your model is a curve.
And meanwhile, if you look at discrete program search, the learning engine
is combinatorial search.
You're just trying a bunch of programs until you find one that actually meets your spec.
This process is extremely data efficient.
You can learn a generalizable program from just one example, two examples, which is why it works so well on Arc, by the way.