Dr. Jeff Beck
๐ค SpeakerAppearances Over Time
Podcast Appearances
I mean, if you forced me at the point of a gun to put a measure on agency, it'd probably look a lot like that.
When you fit your neural network to data via gradient descent,
then you have written an energy function in weight space and you're following it to its energetic minimum.
The advantage of taking an energy-based approach as opposed to taking, say, a straight-up function approximation approach is that an energy-based model comes with something that's kind of like an inductive prior.
Basically, an energy-based model, if you're just doing function approximation, you're basically saying there's any mapping from X to Y. X is my inputs, Y is my inputs.
Any mapping is out there.
I just want to figure out what it is.
Now, in an energy-based model, you're effectively placing constraints on what that input-output relationship can be.
I like thinking about the distinction between an energy-based model and a traditional sort of feedforward neural network has to do with where your cost function is applied.
So in a traditional neural network, you've taken your inputs, you've got your outputs, and the cost function is just a function of the inputs and the outputs.
And the only thing that you're optimizing is the weights.
In an energy-based model, there's another thing that your cost function operates on.
And that's something, one of the internal states of your model.
And as a result, in order to figure out what the best approach is, you actually have to do two minimizations.
One that finds the energetic minimum associated with the part of the cost function that operates on the internal states, like the hidden nodes of your network.
And then one that is the prediction, that is your effective prediction error.
Mm-hmm.
This is very much consistent with the approach that a Bayesian would take.
You have a prior probability distribution, which gives you an energy function over every single latent variable in your model, and you are optimizing with respect to all of them.
So you take a probabilistic approach.