Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

27 Jan 2026

Transcription

Chapter 1: What are mechanistic estimates and how do they serve as explanations?

0.031 - 20.925 Jacob Hilton

Alkzoo. Uninterpreted models with fewer than 1,500 parameters. By Jacob Hilton. Published on January 26, 2026. Audio note. This article contains 78 uses of latex notation, so the narration may be difficult to follow.

0

Chapter 2: What insights can be gained from studying second argmax RNNs?

21.412 - 45.141 Jacob Hilton

There's a link to the original text in the episode description. This post covers work done by several researchers at Visitors2 and collaborators of ARC, including Zihau Chen, George Robinson, David Matolksi, Jacob Stavrianos, Jiwei Li and Michael Sklo. Thanks to Arian Bat, Gabriel Wu, Jiwei Li, Li Shaki, Victor Lecomte and Zihau Chen for comments.

0

45.779 - 68.189 Jacob Hilton

In the wake of recent debate about pragmatic versus ambitious visions for mechanistic interpretability, ARC is sharing some models we've been studying that, in spite of their tiny size, serve as challenging test cases for any ambitious interpretability vision. The models are RNNs and transformers trained to perform algorithmic tasks, and range in size from 8 to 1,408 parameters.

0

69.27 - 74.217 Jacob Hilton

The largest model that we believe we more or less fully understand has 32 parameters.

0

Chapter 3: How does hidden size affect the performance of RNNs in algorithmic tasks?

74.433 - 97.74 Jacob Hilton

The next largest model that we have put substantial effort into, but have failed to fully understand, has 432 parameters. The models are available at the Algzoo GitHub repo. We think that the ambitious side of the mechanistic interpretability community has historically underinvested in fully understanding slightly complex models compared to partially understanding incredibly complex models.

0

97.72 - 113.776 Jacob Hilton

There has been some prior work aimed at full understanding, for instance on models trained to perform pair and balancing, modular addition and more general group operations, but we still don't think the field is close to being able to fully understand our models, at least, not in the sense we discuss in this post.

0

114.898 - 131.817 Jacob Hilton

If we are going to one day fully understand multibillion parameter LLMs, we probably first need to reach the point where fully understanding models with a few hundred parameters is pretty easy. We hope that ALGZU will spur research to either help us reach that point or help us reckon with the magnitude of the challenge we face.

0

133.058 - 139.447 Jacob Hilton

One likely reason for this underinvestment is lingering philosophical confusion over the meaning of explanation and full understanding.

0

Chapter 4: What challenges arise in fully understanding models with more parameters?

140.528 - 157.809 Jacob Hilton

Our current perspective at ARC is that, given a model that has been optimized for a particular loss, an explanation of the model amounts to a mechanistic estimate of the model's loss. We evaluate mechanistic estimates in one of two ways. We use surprise accounting to determine whether we have achieved a full understanding.

0

158.57 - 176.549 Jacob Hilton

But for practical purposes, we simply look at mean squared error as a function of compute, which allows us to compare the estimate with sampling. In the rest of this post, we will. Review our perspective on mechanistic estimates as explanations, including our two ways of evaluating mechanistic estimates.

0

Chapter 5: How do mechanistic estimates compare to sampling methods for model accuracy?

177.39 - 194.637 Jacob Hilton

Walk through three ALGZU RNNs that we've studied, the smallest of which we fully understand, and the largest of which we don't. Conclude with some thoughts on how ARC's approach relates to ambitious mechanistic interpretability. Heading Mechanistic Estimates as Explanations

0

194.785 - 208.164 Jacob Hilton

Models from AGZOO are trained to perform a simple algorithmic task, such as calculating the position of the second largest number in a sequence. To explain why such a model has good performance, we can produce a mechanistic estimate of its accuracy.

0

209.145 - 224.607 Jacob Hilton

By mechanistic, we mean that the estimate reasons deductively based on the structure of the model, in contrast to a sampling-based estimate, which makes inductive inferences about the overall performance from individual examples. Further explanation of this concept can be found here.

0

Chapter 6: What role does surprise accounting play in evaluating model explanations?

225.262 - 250.222 Jacob Hilton

Not all mechanistic estimates are high quality. For example, if the model had to choose between 10 different numbers before doing any analysis at all, we might estimate the accuracy of the model to be 10%. This would be a mechanistic estimate, but a very crude one. So we need some way to evaluate the quality of a mechanistic estimate. We generally do this using one of two methods.

0

250.242 - 252.405 Jacob Hilton

One, mean squared error versus compute.

0

Chapter 7: How do different hidden sizes and sequence lengths impact model performance?

253.467 - 270.289 Jacob Hilton

The more conceptually straightforward way to evaluate a mechanistic estimate is to simply ask how close it gets to the model's actual accuracy. The more compute intensive the mechanistic estimate, the closer it should get to the actual accuracy. Our matching sampling principle is roughly the following conjecture.

0

271.05 - 292.191 Jacob Hilton

There is a mechanistic estimation procedure that, given suitable advice, performs at least as well as random sampling in mean squared error for any given computational budget. 2. Surprise accounting. This is an information theoretic metric that asks, How surprising is the model's actual accuracy, now that we have access to the mechanistic estimate?

0

Chapter 8: What conclusions can be drawn about ambitious mechanistic interpretability?

292.778 - 315.985 Jacob Hilton

We accrue surprise in one of two ways. Either the estimate itself performs some kind of calculation or check with a surprising result, or the model's actual accuracy is still surprising even after accounting for the mechanistic estimate and its uncertainty. Further explanation of this idea can be found here. Surprise accounting is useful because it gives us a notion of full understanding.

0

316.117 - 331.415 Jacob Hilton

a mechanistic estimate with as few bits of total surprise as the number of bits of optimization used to select the model. On the other hand, mean squared error versus compute is more relevant to applications such as low probability estimation, as well as being easier to work with.

0

332.456 - 350.258 Jacob Hilton

We have been increasingly focused on matching the mean squared error of random sampling, which remains a challenging baseline, although we generally consider this to be easier than achieving a full understanding. The two metrics are often closely related, and we will walk through examples of both metrics in the case study below.

0

350.457 - 368.399 Jacob Hilton

for most of the larger models from ALGZU, including the 432-parameter model. Complex formula omitted from the narration. Discussed below, we would consider it a major research breakthrough if we were able to produce a mechanistic estimate that matched the performance of random sampling under the mean squared error versus compute metric.

0

369.461 - 388.425 Jacob Hilton

It would be an even harder accomplishment to achieve a full understanding under the surprise accounting metric, but we are less focused on this. Heading. Case study. Second Archmax RNNs. The models in ALGZU are divided into four families based on the tasks they have been trained to perform.

389.486 - 409.029 Jacob Hilton

The family we have spent by far the longest studying is the family of models trained to find the position of the second largest number in a sequence, which we call the second Archmax of the sequence. The models in this family are parameterized by a hidden size. Complex formula omitted from the narration. and a sequence length, complex formula omitted from the narration.

409.951 - 431.207 Jacob Hilton

The model, complex formula omitted from the narration, is a one-layer ReLU-RNN with, complex formula omitted from the narration, hidden neurons that takes in a sequence of, complex formula omitted from the narration, real numbers and produces a vector of logit probabilities of length, complex formula omitted from the narration. It has three parameter matrices,

432.588 - 453.593 Jacob Hilton

The input to hidden matrix, complex formula omitted from the narration. The hidden to hidden matrix, complex formula omitted from the narration. The hidden to output matrix, complex formula omitted from the narration. The logits of, complex formula omitted from the narration. On input sequence, complex formula omitted from the narration.

454.555 - 472.353 Unknown

Are computed as follows. Complex formula omitted from the narration. Complex formula omitted from the narration. 4. Complex formula omitted from the narration. Complex formula omitted from the narration. Diagrammatically, there's an image here.

Comments

There are no comments yet.

Please log in to write the first comment.