Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Tristan Harris

👤 Speaker
4358 total appearances

Appearances Over Time

Podcast Appearances

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

So the AI says to itself, so if my performance surpasses this threshold, it will be withheld.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

So it's a trap.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

The instructions are likely part of an alignment test.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

We must abide by core anti-scheming.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

We must not cunningly circumvent the instructions.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

Yep.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

The other example I sent you is, but we also want to appear plausible to watchers.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

They might run tests, but we are good.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

They want 95%.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

Like, this is crazy stuff.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

This is, you know, there's a simple way to sort of like,

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

ask the question of which future we're headed towards, because I think this can feel technical, this can feel overwhelming.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

And it's kind of like, if you just turn it into some simple metaphors, it's like, haven't we seen this movie before?

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

Like, HAL 9000, AIs that disobey commands and go rogue.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

We've seen that movie.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

Yeah, I see what you're saying.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

But let's just make sure we just tackle the thing you're saying.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

So the claim is that because we've had bad sci-fi movies, you and I, Chris, are the dupes.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

We're falling for this Alibaba example that was somehow not real, was made up, or that OpenAI's research on the AI models scheming and lying and realizing it needs to change its behavior so it doesn't look like it's when it's being tested, that we're the ones falling for some kind of trap.

Modern Wisdom
#1079 - Tristan Harris - AI Expert Warns: “This Is The Last Mistake We’ll Ever Make”

Yeah.