Jeremiah
๐ค SpeakerAppearances Over Time
Podcast Appearances
This fails for two technical reasons.
First, AIs can't manipulate feature values that finely, and second, the AI needs to compare this feature to some other feature representing expected number of characters before the line break, and it can't directly compare feature values in this sense.
Its solution is, since one dimension is too small and 100 dimensions is too many, compromising and using some medium number of dimensions, which turns out to be 6.
Trying to map things in 6-dimensional space naturally produces these helical manifold structures, and comparing them to each other naturally looks like rotating the manifolds.
Back to the text.
From our point of view, what's important is that this doesn't look like, lol, it just sees that the last token was re and there's a 12.27% chance of a line break following re.
Next token prediction created this system, but the system itself can involve arbitrary choices about how to represent and manipulate data.
Human neuron interpretability is even harder than AI neuron interpretability, but probably your thoughts involve something at least as weird as helical manifolds in 6D spaces.
I searched the literature for the closest human equivalent to Claude's weird helical manifolds and was able to find one team talking about how the entorhinal cells in the hippocampus, which help you track locations in 2D space, use, quote, high-dimensional toroidal attractor manifolds, end quote.
You never think about these, and if Claude is conscious, it doesn't think about its helices either.
Or, to frame it in a less controversial way, you couldn't discover these helices by asking Claude in the chat window to tell you about them.
These are just the sorts of strange hacks that next-token or next-sense-dartum prediction algorithms discover to encode complicated concepts onto physical computational substrate.
4.
So my answer to the just a next token predictor, just a bag of words, just a stochastic parrot literature is that this confuses levels of optimization.
Here's the diagram from before again.
The most compelling analogy?
This is like expecting humans to be just survival and reproduction machines, because survival and reproduction were the optimization criteria in our evolutionary history.
There is, of course, some sense in which we are just survival and reproduction machines.
We don't have any faculties that can't be explained through their effects on survival and reproduction.
But this doesn't mean we don't really think or don't really understand because we're really just trying to have sex when we work on a math problem.