Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
But I analogize this to the person who's lost their keys and is just looking under the streetlight where it's very easily to observe this behavior.
Um, one leading cognitive neuroscientist said to me that a dirty little secret of any FMRI study where you're looking at brain activity for a given task is that the cerebellum is almost always active and lighting up for it.
Um, if you have a damaged cerebellum, you also are much more likely to have autism.
Um, so it's associated with like, um, social skills.
Um, and one of these particular studies where I think they use PET instead of FMRI, but, um, when you're doing next token prediction, the cerebellum lights up a lot.
Um,
Yeah.
Yeah.
So I guess the main thing I want to communicate here.
So back in the 1980s, Penti Canerva came up with a associative memory algorithm for I have a bunch of memories.
I want to store them.
There's some amount of noise or corruption that's going on.
And I want to query or retrieve the best match.
And so he writes this equation for how to do it.
And a few years later realizes that if you implemented this as an electrical engineering circuit, it actually looks identical to the core cerebellar circuit.
And that circuit and the cerebellum more broadly is not just in us, it's in basically every organism.
There's active debate on whether or not cephalopods have it.
They kind of have a different evolutionary trajectory.
But even fruit flies with the Drosophila mushroom body, that is the same cerebellar architecture.
And so that convergence and then my paper, which shows that actually this operation is to a very close approximation, the same as the attention operation, including implementing the softmax and having this sort of like nominal quadratic cost that we've been talking about.