Dr. Jeff Beck
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so like, let's do it.
So the whole point of JEPA, as I understand it, I'm sure there are other points.
is that you're gonna take, you're gonna compress your inputs and compress your outputs and then do all the learning in this compressed space.
Love it, right?
Science is about prediction and data compression.
Let's make that compression explicit on the front end and the back end.
The downside of this approach is that it doesn't work out of the box, right?
Because it's very easy to find a compression or an embedding of the inputs and an embedding of the outputs for which prediction is perfect, which is to basically make both of them zero.
And so you have to do some other things.
Other tricks need to be employed in order to make it work.
In my opinion.
Because the whole point of the wake sleep algorithm and contrastive divergence was that, oh, it's actually biologically plausible, right?
It was an end run around the need to do back prop, and that's what made it so clever and interesting, in my opinion.
There's an old school way of accomplishing the same thing.
And that is to do all of your, it's called pre-processing, right?
And this is something that a lot of people do.
You take your data and, in fact, we do this all the time with like vision language models, right?
So we want to use an LLM and we want to predict images.
So what do we do?
Well, the first thing we have to do is tokenize the image, right?