Jacob Hilton
๐ค SpeakerAppearances Over Time
Podcast Appearances
Alkzoo.
Uninterpreted models with fewer than 1,500 parameters.
By Jacob Hilton.
Published on January 26, 2026.
Audio note.
This article contains 78 uses of latex notation, so the narration may be difficult to follow.
There's a link to the original text in the episode description.
This post covers work done by several researchers at Visitors2 and collaborators of ARC, including Zihau Chen, George Robinson, David Matolksi, Jacob Stavrianos, Jiwei Li and Michael Sklo.
Thanks to Arian Bat, Gabriel Wu, Jiwei Li, Li Shaki, Victor Lecomte and Zihau Chen for comments.
In the wake of recent debate about pragmatic versus ambitious visions for mechanistic interpretability, ARC is sharing some models we've been studying that, in spite of their tiny size, serve as challenging test cases for any ambitious interpretability vision.
The models are RNNs and transformers trained to perform algorithmic tasks, and range in size from 8 to 1,408 parameters.
The largest model that we believe we more or less fully understand has 32 parameters.
The next largest model that we have put substantial effort into, but have failed to fully understand, has 432 parameters.
The models are available at the Algzoo GitHub repo.
We think that the ambitious side of the mechanistic interpretability community has historically underinvested in fully understanding slightly complex models compared to partially understanding incredibly complex models.
There has been some prior work aimed at full understanding, for instance on models trained to perform pair and balancing, modular addition and more general group operations, but we still don't think the field is close to being able to fully understand our models, at least, not in the sense we discuss in this post.
If we are going to one day fully understand multibillion parameter LLMs, we probably first need to reach the point where fully understanding models with a few hundred parameters is pretty easy.
We hope that ALGZU will spur research to either help us reach that point or help us reckon with the magnitude of the challenge we face.
One likely reason for this underinvestment is lingering philosophical confusion over the meaning of explanation and full understanding.
Our current perspective at ARC is that, given a model that has been optimized for a particular loss, an explanation of the model amounts to a mechanistic estimate of the model's loss.