Jacob Hilton
๐ค SpeakerAppearances Over Time
Podcast Appearances
The models in ALGZU are divided into four families based on the tasks they have been trained to perform.
The family we have spent by far the longest studying is the family of models trained to find the position of the second largest number in a sequence, which we call the second Archmax of the sequence.
The models in this family are parameterized by a hidden size.
Complex formula omitted from the narration.
and a sequence length, complex formula omitted from the narration.
The model, complex formula omitted from the narration, is a one-layer ReLU-RNN with, complex formula omitted from the narration, hidden neurons that takes in a sequence of, complex formula omitted from the narration, real numbers and produces a vector of logit probabilities of length, complex formula omitted from the narration.
It has three parameter matrices,
The input to hidden matrix, complex formula omitted from the narration.
The hidden to hidden matrix, complex formula omitted from the narration.
The hidden to output matrix, complex formula omitted from the narration.
The logits of, complex formula omitted from the narration.
On input sequence, complex formula omitted from the narration.
Description.
Each model in this family is trained to make the largest logit be the one that corresponds to the position of second-largest input using softmax cross-entropy loss.
The models we'll discuss here are complex formula omitted from the narration, complex formula omitted from the narration, and complex formula omitted from the narration.
For each of these models, we'd like to understand why the trained model has high accuracy on standard Gaussian input sequences.
Subheading.
Hidden size 2, sequence length 2.
The model, complex formula omitted from the narration, can be loaded in Algzoo using.
There's a code block here in the text.