Jacob Hilton
๐ค SpeakerAppearances Over Time
Podcast Appearances
Complex formula omitted from the narration.
and, complex formula omitted from the narration, and to do so in a primarily algorithmic way.
This is why we consider producing such a mechanistic estimate to be a formidable research challenge.
Some notes with further discussion of this model can be found here.
RNNs for the second Arjmax and complementary notebook by Zihao Chen.
Heading Conclusion
The models in ALGZU are small, but for all but the tiniest of them, it is a considerable challenge to mechanistically estimate their accuracy competitively with sampling, let alone fully understand them in the sense of surprise accounting.
At the same time, ALGZU models are trained on tasks that can easily be performed by LLMs, so fully understanding them is practically a prerequisite for ambitious LLM interpretability.
Overall, we would be keen to see other ambitious-oriented researchers explore our models, and more concretely, we would be excited to see better mechanistic estimates for our models in the sense of mean squared error versus compute.
One specific challenge we pose is the following.
Challenge.
Design a method for mechanistically estimating the accuracy of the 432 parameter model.
Complex formula omitted from the narration.
That matches the performance of random sampling in terms of mean squared error versus compute.
A cheap way to measure mean squared error is to add noise to the model's weights enough to significantly alter the model's accuracy and check the squared error of the method on average over the choice of noisy model.
How does ARC's broader approach relate to this?
The analysis we have presented here is relatively traditional mechanistic interpretability, but we think of this analysis mainly as a warm-up.
Ultimately, we seek a scalable, algorithmic approach to producing mechanistic estimates, which we have been pursuing in our recent work.
Furthermore, we are ambitious in the sense that we would like to fully exploit the structure present in models to mechanistically estimate any quantity of interest.
Thus our approach is best described as ambitious and mechanistic, but perhaps not as interpretability.