Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jacob Hilton

๐Ÿ‘ค Speaker
204 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Finally, our analysis once again allows us to construct an improved model by hand, which has 99.99% accuracy.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Subheading.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Hidden size 16, sequence length 10.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The model, complex formula omitted from the narration, can be loaded in ALGZU using.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

There's a code block here in the text.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

It has 432 parameters and an accuracy of 95.3%.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

This model is deep enough that a brute force approach is no longer viable.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Instead, we look for features in the activation space of the model's hidden state.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

After rescaling the neurons of the hidden state, we notice an approximately isolated subcircuit formed by neurons 2 and 4, with no strong connections to the outputs of any other neurons.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

It follows that after unrolling the RNN4, complex formula omitted from the narration.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Steps.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Neuron 2 is approximately complex formula omitted from the narration Neuron 4 is approximately complex formula omitted from the narration

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

This can be proved by induction using the identity.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

For neuron 4.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Next, we notice that neurons 6 and 7 fit into a larger approximately isolated subcircuit together with neurons 2 and 4.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Using the same identity, it follows that after unrolling the RNN4.

LessWrong (Curated & Popular)
"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.