Dr. Jeff Beck
๐ค SpeakerAppearances Over Time
Podcast Appearances
Yes, there is more computation involved, but we've got a lot of great tricks for making that totally tractable.
Regularization term, I think, is the short answer, right?
No, so the difference between, and if you're being very, very, very pedantic
The difference between minimizing energy and minimizing free energy is that free energy has this additional entropy penalty term.
Now, if you're just doing maximum likelihood estimation, if you're minimizing your energy function with respect to some particular, well, let's pretend we're only worried about one variable.
And I'm just going to get a point estimate and call it a day, do some kind of map estimation to get that one thing.
There's not that big of a difference, right?
Because there is no probability distribution over the latent that allows you to compute that regularization term.
But that's the only difference.
It's are you regularizing or not is, I think, the easiest way to think about it.
It's joint embedding prediction architecture.
There we go.
So what's the joint embedding bit about?
Well, the joint embedding bit about is, well, I'm going to take my inputs, I'm going to take my outputs, and I'm going to embed them in some space.
And then I'm going to learn a prediction between the two embeddings.
And that's a great idea.
It's a great idea because it has some of the flavor of what we would like to get out of our models.
We're not interested in predicting, in many situations, I should be very particular about this, in many situations, we're not interested in predicting every single pixel on the image.
We want to get, you know, maybe something that's a little more gestalt, a little more high level, a little more conceptual understanding of what's going on.
And so emphasizing the goal of predicting every single pixel, which is what's typically done in generative modeling right now, you know, might lose some of the power, the abstractive power of some of the networks.