Yoshua Bengio
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
It's based on a simple notion that if we can bake honesty into AI, we can get safety.
So then we can reduce the problem to how we train a system to be honest.
And it turns out that there's a way to do that, that only requires changing the training objective and the way the data is processed.
There's also another aspect, which is it's a system relying on a non-agentic foundation that is a predictor that is not trained by reinforcement learning and is going to have these honesty guarantees.
But we can then use this using the same kind of math to construct a policy, construct an agent that will be trained in a way that also provides those guarantees.
So the main difference with the training process is that it is geared at approximating the Bayesian posterior over queries in natural language.
So imagine a neural net and some extra apparatus around it, like chain of thought style, that takes questions about statements regarding properties of the world that can be true or false.
given other statements and then it outputs probability.
That's the core building block, we call it a predictor.
We can use stochastic gradient descent on a different objective that has the property that the objective is globally minimized by the Bayesian predictor.
In other words, the predictor that fits the data and has a small description length.
So you do it by showing it data of, for example, everything, the same kind of data that is used currently to train advanced models.
except that that data has been modified.
So whereas currently our autoregressive models, for example, are trained to predict the next token, this thing is trained to predict whether the next statement is true or false.
And typically the next statement is going to be what we call a communication act.
It's going to be something that is taken from a document somewhere, and we're not sure that the claim made in that statement is true or false, but we're sure...
that somebody made that claim.
And we may have information about it, who, when, and where, and so on.
So the AI is going to be trained to explain those statements.
So not just compute those probabilities, but come up in its, what we call latent variables, which are also natural language statements.