Zach Furman
๐ค SpeakerAppearances Over Time
Podcast Appearances
From a theoretical computer science perspective, this is what algorithms look like, in general.
Not just the specific trigonometric trick from grokking, but computation as such.
You take a hard problem, break it into pieces, solve the pieces, and combine the results.
What makes this tractable, what makes it an algorithm rather than a lookup table, is precisely the compositional structure.
The reuse is what makes it compact.
The compactness is what makes it feasible.
There's an image here.
Groking and Inception V1 are two examples, but they are far from the only ones.
Mechanistic interpretability has grown into a substantial field and the researchers working in it have documented many such structures in toy models, in language models, across different architectures and tasks.
Induction heads, language circuits, and bracket matching in transformer language models, learned world models and multi-step reasoning in toy tasks, grid cell-like mechanisms in RL agents, hierarchical representations in GANs, and much more.
Where we manage to look carefully, we tend to find something mechanistic.
This raises a question.
If what we find inside trained networks, at least when we can find anything, looks like algorithms built from parts, what does that suggest about what deep learning is doing?
Heading.
The hypothesis.
What should we make of this?
We have seen neural networks learn solutions that look like algorithms, compositional structures built from simple, reusable parts.
In the grokking case, this coincided precisely with generalization.
In Inception V1, this structure is what lets the network recognize objects despite the vast dimensionality of the input space.
And across many other cases documented in the mechanistic interpretability literature, the same shape appears.