Jacob Hilton

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The models in ALGZU are divided into four families based on the tasks they have been trained to perform.

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The family we have spent by far the longest studying is the family of models trained to find the position of the second largest number in a sequence, which we call the second Archmax of the sequence.

389.486 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The models in this family are parameterized by a hidden size.

399.897 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Complex formula omitted from the narration.

403.521 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

and a sequence length, complex formula omitted from the narration.

405.824 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The model, complex formula omitted from the narration, is a one-layer ReLU-RNN with, complex formula omitted from the narration, hidden neurons that takes in a sequence of, complex formula omitted from the narration, real numbers and produces a vector of logit probabilities of length, complex formula omitted from the narration.

409.951 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

It has three parameter matrices,

429.384 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The input to hidden matrix, complex formula omitted from the narration.

432.588 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The hidden to hidden matrix, complex formula omitted from the narration.

437.414 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The hidden to output matrix, complex formula omitted from the narration.

442.14 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The logits of, complex formula omitted from the narration.

447.046 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

On input sequence, complex formula omitted from the narration.

450.65 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Description.

473.434 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Each model in this family is trained to make the largest logit be the one that corresponds to the position of second-largest input using softmax cross-entropy loss.

484.057 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

The models we'll discuss here are complex formula omitted from the narration, complex formula omitted from the narration, and complex formula omitted from the narration.

493.369 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

For each of these models, we'd like to understand why the trained model has high accuracy on standard Gaussian input sequences.

503.501 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Subheading.

510.989 View full episode →

LessWrong (Curated & Popular)

"AlgZoo: uninterpreted models with fewer than 1,500 parameters" by Jacob_Hilton

Hidden size 2, sequence length 2.