Jeremiah
๐ค SpeakerAppearances Over Time
Podcast Appearances
Will the next word fit?
Final prediction is made by arranging representations to be linearly separable.
And how are representations constructed?
Multiple attention heads specialized in particular token ranges.
Their sum creates a line count manifold.
Scott writes, The answer was, the AI represents various features of the line breaking process as one-dimensional helical manifolds in a six-dimensional space, then rotates the manifolds in some way that corresponds to multiplying or comparing the numbers that they're representing.
You don't need to understand what this means, so I've relegated my half-hearted attempt to explain it to a footnote.
Here's the footnote.
My extremely half-hearted attempt at understanding this claim, the AI needs to track things like whether you're on character 1, 2, 3, etc.
of the current line.
The simplest way to do this would be to have one feature for the state of being on character number 1, another for the state of being on character number 2, etc.
Since AI features can be modeled as dimensions, this would correspond to locating the current character count in a 100-dimensional space, which would work.
But this is expensive in feature count.
A document with 100 characters per line would take 100 features for this simple task.
Another simple way to do this would be to have one feature whose value gets higher as the character count goes up.
This would correspond to locating the character count in a one-dimensional space, aka a straight line.
This fails for two technical reasons.
First, AIs can't manipulate feature values that finely, and second, the AI needs to compare this feature to some other feature representing expected number of characters before the line break, and it can't directly compare feature values in this sense.
Its solution is, since one dimension is too small and 100 dimensions is too many, compromising and using some medium number of dimensions, which turns out to be 6.
Trying to map things in 6-dimensional space naturally produces these helical manifold structures, and comparing them to each other naturally looks like rotating the manifolds.