Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Yoshua Bengio

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
2216 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 2
Confidence: High

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

come up with the best explanation it can find, including causal explanations.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

So what you get at the end of the day are these probabilities, but you also get to represent hypotheses about the world that are not communication acts, that are

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

a factual hypothesis that the system isn't necessarily sure about, but it's going to be producing a probability for these.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Then we can query these same factual statements.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Whereas in normal LLMs, the only query you can make is about whether a person

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

would respond in a particular way.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And maybe you can use a pre-prompt to ask for a different kind of persona, but at the end of the day, you get what a person would say, which of course can be deceptive for all kinds of reasons.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

So right now we have system that have implicit goals.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

So what do I mean by this?

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

I mean, they will of course be trying to please us, for example, or to respond like a person would.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

But both of these parts of the training, so the autoregressive pre-training where they're trained to imitate people and the reinforcement learning part where they're trained to please people or respond in ways that get positive feedback in things like RLHF,

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Both of these parts of the training process induce implicit goals.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

So what do I mean?

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

Well, for example, in the pre-training, that means the AI is going to inherit our self-preservation drives.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And more recently, we've seen they also inherit our drive to protect others like us, which means AIs...

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

have been shown to behave against our instructions to protect other AIs that would be shut down, right?

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

So it's called peer preservation now.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

So that's an example.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

And then the goal seeking part of the training with reinforcement learning induces an issue with instrumental goals and potentially also reward hacking.

80,000 Hours Podcast
I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

which basically mean that AI will have a drive to do things that we didn't ask and maybe we would disagree with.