Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jacob Hilton

๐Ÿ‘ค Speaker
204 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

and, complex formula omitted from the narration, and to do so in a primarily algorithmic way.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

This is why we consider producing such a mechanistic estimate to be a formidable research challenge.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Some notes with further discussion of this model can be found here.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

RNNs for the second Arjmax and complementary notebook by Zihao Chen.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Heading Conclusion

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The models in ALGZU are small, but for all but the tiniest of them, it is a considerable challenge to mechanistically estimate their accuracy competitively with sampling, let alone fully understand them in the sense of surprise accounting.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

At the same time, ALGZU models are trained on tasks that can easily be performed by LLMs, so fully understanding them is practically a prerequisite for ambitious LLM interpretability.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Overall, we would be keen to see other ambitious-oriented researchers explore our models, and more concretely, we would be excited to see better mechanistic estimates for our models in the sense of mean squared error versus compute.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

One specific challenge we pose is the following.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Challenge.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Design a method for mechanistically estimating the accuracy of the 432 parameter model.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

That matches the performance of random sampling in terms of mean squared error versus compute.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

A cheap way to measure mean squared error is to add noise to the model's weights enough to significantly alter the model's accuracy and check the squared error of the method on average over the choice of noisy model.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

How does ARC's broader approach relate to this?

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The analysis we have presented here is relatively traditional mechanistic interpretability, but we think of this analysis mainly as a warm-up.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Ultimately, we seek a scalable, algorithmic approach to producing mechanistic estimates, which we have been pursuing in our recent work.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Furthermore, we are ambitious in the sense that we would like to fully exploit the structure present in models to mechanistically estimate any quantity of interest.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Thus our approach is best described as ambitious and mechanistic, but perhaps not as interpretability.