Jeremiah
๐ค SpeakerAppearances Over Time
Podcast Appearances
How are different character counts represented?
Character count and line width manifolds are aligned and discretized by features.
How is the boundary detected?
Boundary heads use QK to shift offsets of line count and width manifolds.
Will the next word fit?
Final prediction is made by arranging representations to be linearly separable.
And how are representations constructed?
Multiple attention heads specialized in particular token ranges.
Their sum creates a line count manifold.
Scott writes, The answer was, the AI represents various features of the line breaking process as one-dimensional helical manifolds in a six-dimensional space, then rotates the manifolds in some way that corresponds to multiplying or comparing the numbers that they're representing.
You don't need to understand what this means, so I've relegated my half-hearted attempt to explain it to a footnote.
Here's the footnote.
My extremely half-hearted attempt at understanding this claim, the AI needs to track things like whether you're on character 1, 2, 3, etc.
of the current line.
The simplest way to do this would be to have one feature for the state of being on character number 1, another for the state of being on character number 2, etc.
Since AI features can be modeled as dimensions, this would correspond to locating the current character count in a 100-dimensional space, which would work.
But this is expensive in feature count.
A document with 100 characters per line would take 100 features for this simple task.
Another simple way to do this would be to have one feature whose value gets higher as the character count goes up.
This would correspond to locating the character count in a one-dimensional space, aka a straight line.