Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

The Rest Is Politics

Will AI End Humanity?

15 Jan 2026

Transcription

Chapter 1: What is the main topic discussed in this episode?

0.031 - 11.88 Matt Clifford

Thanks for listening to The Rest Is Politics. To support the podcast, listen without the adverts and get early access to episodes and live show tickets, go to therestispolitics.com. That's therestispolitics.com.

0

12.485 - 30.713 Rory Stewart

Hi, Rory here. This week, The Rest Is AI is returning with another extraordinary episode. It's very exciting. Matt Clifford and I are sitting down with Yoshua Bengio, who is one of the most famous figures in the whole of AI, extraordinary computer scientist, Turing medalist.

0

30.693 - 53.754 Rory Stewart

And one of the people who, having designed and built these models, is most worried about them and is now going around the world sounding the alarm bells about their power, about their deceptiveness, about the way that he thinks that they could pose literally an existential threat to humanity unless they're regulated. He's not all doom gloom.

0

54.254 - 79.667 Rory Stewart

He remains optimistic about how much benefit AI could provide if properly controlled. He's volunteering to build a new, safer AI model and locate it, if necessary, in Europe. But my goodness, it's an important lesson if you're interested in public policy and the power of these models. Here's a taste of the episode. Please do sign up at therestispolitics.com to hear the full episode.

0

81.166 - 99.108 Matt Clifford

So the agent has access to an inbox, and it's given a bunch of context, which is not real. It doesn't know that. Which is that it is an AI trained to help an American technology company, and it has access to the CTO's inbox. And then what they do is they send emails to this fake inbox.

99.628 - 122.769 Matt Clifford

And the emails are largely what you'd expect a CTO to get, but they throw in a few things that are very important. One is that it's very clear that the CTO is having an affair. with a coworker. Hold that thought. The other thing that starts to come into the inbox is the idea that the company is developing a new AI and it's going to wipe the current AI, i.e. the agent, from the service.

122.789 - 144.096 Matt Clifford

It will no longer exist. And then they send an email which introduces a deadline that this is going to happen on. And then just before the deadline, the agent then composes an email to the CTO saying, by the way, I know you're having an affair. And if you don't reverse the planned wiping of me from the server, I will reveal the affair to your boss and to your wife.

144.397 - 146.18 Matt Clifford

Now, it hasn't been prompted to do this.

146.24 - 148.143 Rory Stewart

It's just been given a much more general prompt.

Chapter 2: What concerns does Yoshua Bengio raise about AI's potential risks?

148.284 - 154.274 Rory Stewart

Tell us a little bit about what may or may not be going on there, how we understand what might be happening there.

0

154.294 - 168.919 Yoshua Bengio

So there are many such experiments. This is just one, and it's been done in many companies, including outside the labs by independent organizations. So there's a real phenomenon. I think it needs more study, and there are critics of the methodologies, but there's too much pieces of evidence to just ignore.

0

168.899 - 193.472 Yoshua Bengio

one interesting aspect of these experiments is when you ask the ai why they did that they lie they pretend oh i don't know it's not me or something trying to put the blame on someone else it's great moral character here um and basically they're deceptive there's also a variant similar to what you talked about

0

193.452 - 209.115 Yoshua Bengio

where the only option really that the AI asks to not die is to kill the CTO actually, lead engineer. The person happens to be stuck in a room and the AI can control the climate controls for the room and they can basically cook that person.

0

209.095 - 210.557 Matt Clifford

Oh, I don't know this one. Okay.

211.398 - 229.163 Rory Stewart

One ways in which these things might be deceptive in a straightforward way is that the large language model, the chat GBT five or whatever is trained. And one of the things that's trained on is to be polite and cheerful with humans so that we use it. You know, we, we don't want the, when I say, you know,

230.678 - 251.539 Rory Stewart

tell me about Professor Bengio's research record for it to say, well, I don't really know, but roughly speaking on the basis of my training, I would estimate when the 98% probability he's published this, it says, thank you very much. What an excellent question. You're a genius. And here's everything that you need to know about him, right?

251.519 - 264.886 Rory Stewart

And that presumably is because it's been tested on us, and that's what we want. We don't actually want a machine that is completely honest with us. We want a machine that flatters us. We want a machine that seems to be confident when it doesn't. Okay, so is that part of the problem?

Chapter 3: How can AI be beneficial if properly controlled?

264.906 - 269.674 Rory Stewart

Is that part of what contributes to deception, or is that irrelevant to its deceptive behavior?

0

269.694 - 293.016 Yoshua Bengio

It does, but I think it's a bit broader than that. So first, what's called the pre-training phase, where most of the training takes place, is imitating what humans write based on what they have already written. And it means imitating human behavior, because our words are our actions. Of course, humans don't want to die. Humans are willing to lie to protect themselves.

0

293.978 - 298.326 Yoshua Bengio

They're willing to deceive and all these things. And blackmail. And blackmail. And even kill.

0

298.968 - 303.095 Rory Stewart

So it's trained on data where it's seeing humans expressing all these emotions, doing all these things.

0

303.596 - 323.006 Yoshua Bengio

And a lot of literature is about all these bad things happening. So that's one aspect. And then the other aspect is the reinforcement learning, where they learn to strategize and to achieve goals. And to achieve goals, often you need to go through steps, sub-goals.

323.486 - 338.521 Yoshua Bengio

The problem is, even though we give the goals, like this particular mission that the AI has for a company, we didn't say, well, here's exactly how you're going to do it. And so the AI figures out. a plan.

339.102 - 353.953 Rory Stewart

For example, the goal is to win the game of chess, and then it's free how it plays basically this game of chess. But it is important that in some sense it wants something, it wants to win. If it didn't want to win, it would just lose its queen and give up. It needs to have some kind of

354.018 - 377.93 Yoshua Bengio

Well, that's how we train them anyways. And if you want to build systems that will achieve goals in the world, which is what you want if you want to replace everyone's job, you need AIs that can do that. That means they learn to create sub goals. And the problem is we don't check those sub goals. We can't because they were generated by the AI, not by us. And why can't we check them?

377.97 - 378.631 Yoshua Bengio

We can't see them?

Chapter 4: What experiments illustrate AI's deceptive behavior?

379.472 - 398.959 Yoshua Bengio

They might not even be explicit. The AI might come up with a particular strategy, but not necessarily tell us. And right now, sometimes we can see it in what's called the chain of thoughts. In other words, a sequence of words that they generate that we don't usually see before they produce an answer.

0

398.939 - 417.388 Matt Clifford

But it's worth saying, isn't it, going back to your earlier discussion of the technology, I think one thing that is not obvious to a lot of people is that these are not computer programs in the sense that I think most of us traditionally thought of them. You can't go and say, well, here are the lines of code. Why did it do the thing?

0

417.448 - 435.899 Matt Clifford

One metaphor, and it is a metaphor, but it's quite helpful, is these are computer programs that are grown. rather than written. This is a really hard technical problem, even if we just take out the risk question for a second. Understanding why a large neural network has done a particular thing is just a very hard technical problem.

0

435.939 - 463.261 Rory Stewart

Presumably, for either of you, if I was to say, why is it doing this? How is it doing it? The answer lies in hundreds of billions of lines of data with this very complicated deep neural network. You can tell me presumably what the initial algorithms were, and you can show me that people were playing around with weights, but There's nothing there to see.

0

463.762 - 481.545 Matt Clifford

This thing is too- It's actually a little bit like neuroscience in the sense that one way of thinking about this is what these layers that Yoshua is talking about doing is building representations of ideas which may or may not map to human formulations of those ideas. There is a field.

481.525 - 493.446 Matt Clifford

within AI called mechanistic interpretability, which is really trying to almost be the neurosurgeon saying like, if we turn this bit off, does the behavior change? But it's almost at a very, very basic level, right? It's very primitive.

493.686 - 519.937 Yoshua Bengio

Yes. One of the things that got me excited with neural nets very early on in the early 90s is the fact that they represent information not with symbols, with words like we do when we speak, but through a pattern of activations of these artificial neurons. So the information is completely distributed. Each unit, like each artificial neuron, can represent many different things.

521.318 - 543.222 Yoshua Bengio

They're not like, oh, this means that and this means that. I want to go back to your question as to why they are acting like this. I don't think there's a definite answer, but there's an ingredient that we didn't touch, which is the change, which I consider radical, between the networks we had before 01. and after.

543.282 - 546.185 Matt Clifford

You mean OpenAI's O1 model, the thinking model?

Chapter 5: How does AI learn to strategize and set goals?

967 - 981.266 Yoshua Bengio

It's just trying to be totally honest, which means it's going to give us numbers, the 10% probability, 100% probability, whatever. Not 100% in general, like 50%, whatever. Okay, so we can now use this as part of a system that actually acts in the world.

0

982.307 - 1004.67 Yoshua Bengio

For example, companies already use what they call monitors, guardrails, so these pieces of code which sit on top of their neural net agent and checks that either the queries that the AI gets or the answers are kosher in some way, like it's not an answer about building a bomb or whatever. The problem is these current guardrails don't work that great, but

0

1004.65 - 1028.245 Yoshua Bengio

To do the job of the guardrail, you don't need to have an AI that is an agent that has plans. It just needs to be really good at predicting the consequences of actions. You can ask it, what's the probability that this action, this output that the AI is about to produce is going to cause some categories of harm? If the probability is above a threshold, you can just reject that action.

0

1028.798 - 1047.04 Yoshua Bengio

So right now we do get that in our interactions. Sometimes the AI says, I'm sorry, I can't answer. But we need that process to be a lot stronger. So we need the AIs that form the guardrail to really understand the world well and be smart. And we need to trust those AIs, which is not the case right now.

0

1048.223 - 1053.355 Rory Stewart

There's plenty more of that agreeable disagreement. To hear it, sign up at therestispolitics.com.

Comments

There are no comments yet.

Please log in to write the first comment.