Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jacob Hilton

๐Ÿ‘ค Speaker
204 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Alkzoo.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Uninterpreted models with fewer than 1,500 parameters.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

By Jacob Hilton.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Published on January 26, 2026.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Audio note.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

This article contains 78 uses of latex notation, so the narration may be difficult to follow.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

There's a link to the original text in the episode description.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

This post covers work done by several researchers at Visitors2 and collaborators of ARC, including Zihau Chen, George Robinson, David Matolksi, Jacob Stavrianos, Jiwei Li and Michael Sklo.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Thanks to Arian Bat, Gabriel Wu, Jiwei Li, Li Shaki, Victor Lecomte and Zihau Chen for comments.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

In the wake of recent debate about pragmatic versus ambitious visions for mechanistic interpretability, ARC is sharing some models we've been studying that, in spite of their tiny size, serve as challenging test cases for any ambitious interpretability vision.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The models are RNNs and transformers trained to perform algorithmic tasks, and range in size from 8 to 1,408 parameters.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The largest model that we believe we more or less fully understand has 32 parameters.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The next largest model that we have put substantial effort into, but have failed to fully understand, has 432 parameters.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The models are available at the Algzoo GitHub repo.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

We think that the ambitious side of the mechanistic interpretability community has historically underinvested in fully understanding slightly complex models compared to partially understanding incredibly complex models.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

There has been some prior work aimed at full understanding, for instance on models trained to perform pair and balancing, modular addition and more general group operations, but we still don't think the field is close to being able to fully understand our models, at least, not in the sense we discuss in this post.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

If we are going to one day fully understand multibillion parameter LLMs, we probably first need to reach the point where fully understanding models with a few hundred parameters is pretty easy.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

We hope that ALGZU will spur research to either help us reach that point or help us reckon with the magnitude of the challenge we face.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

One likely reason for this underinvestment is lingering philosophical confusion over the meaning of explanation and full understanding.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Our current perspective at ARC is that, given a model that has been optimized for a particular loss, an explanation of the model amounts to a mechanistic estimate of the model's loss.

โ† Previous Page 1 of 11 Next โ†’