Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And then in a second stage, reinforcement learning trains those models to produce the kinds of responses that we're most likely to say that we like, that we want, rather than just the responses that were most probable in the full corpus of all human-generated text.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Now, Joshua's alternative is to build an AR model oriented not around predicting what a human would be likely to say or what they would prefer to hear, but around modeling what's actually true in the world by developing hypotheses and assigning probabilities to them with the goal of best explaining all of the data that it's exposed to during its training process.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Joshua argues that you'd be able to train a model of this type while porting over most of the methods we use to train ordinary LLMs today, benefiting from the same neural net architectures, training techniques, scanning improvements, all of that.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And you'd also be able to train it on roughly the same body of raw text that we use for all other AIs.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

But you could structure that data a bit differently, giving it what AI researchers call a different syntax.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

First, all of the things that people said or wrote, they get tagged as communication acts.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

We know someone said these things and we know where they said it, but we don't know whether they're true.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And second, a small number of statements that we have strong independent grounds for, verified mathematical proofs and some scientific measurements, they get tagged as verified factual claims about the world.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

The model is then trained to find the combination of possible underlying facts about the world that would best explain everything that it sees in aggregate, both the things people said and the verified facts that it's been given as ground truth.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

These hypothesized facts about the world, they're what AI researchers call latent variables, meaning variables that the AI can't directly observe that it's going to have to infer indirectly instead.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

What the model will ultimately be able to give us is its estimated probability that any given statement in natural human language is true, as well as how much the model trusts its own answer on that, how confident it is that it has a good grip on that question.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Crucially, Joshua says that by tagging all text into these two categories from the very beginning, things someone said versus factual statements, you can then ask the model questions as though you're asking about reality, not about communication acts by using the factual statement tag.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And because these two categories have been there from the very beginning, the model knows the difference and it won't blur the line between the two.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

That's something you don't get with AI models today.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And Joshua also argues, using various mathematical theorems in his papers, that unlike ordinary LLMs, a model trained in this way would be honest by design.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And furthermore, that such an AR model would by itself have no goals and no preferences about the state of the world.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

It would be what Joshua calls just a pure predictor.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Now, there's two main uses for this.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Near-term, as a sort of stopgap solution, you bolt the predictor onto existing AI agents as a sort of guardrail, an independent filter that sits between the agent and the world, checking over its proposed actions and rejecting those that it predicts will be harmful.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

But as he'll explain in a minute, Joshua thinks we can ultimately do much better than this.