Trenton Bricken

👤 Speaker

See mentions of this person in podcasts

1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It shows that Claude really wants to always be good.

2577.116 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

The danger is that...

2580.701 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

We never necessarily programmed this in.

2582.463 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

We tried, but there were no guarantees.

2584.928 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And even between models, we did this for Sonnet and Opus.

2587.152 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Opus really cares about animal welfare.

2592.041 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It will do the same long-term scheming to protect animals, but Sonnet won't.

2594.285 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so – and like I don't think we can actually tell you exactly why one model cares about this and not the other.

2599.507 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So it's arbitrary.

2604.533 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's black boxy.

2605.955 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And the concern is that we would first train it on some maximized reward setting and that's the reward that gets locked in.

2607.257 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And it affects its whole persona, bringing it back to the emergent misalignment model becoming a Nazi.

2615.327 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then when you do later training on it to make it helpful, harmless, and honest, it sandbags and only pretends in the short term in order to play the long game.

2620.533 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But we have so many innate biases to follow social norms, right?

2677.551 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, Joe Heinrich's Secret of Our Success is all about this.

2681.355 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I don't know, even if kids aren't in the conventional school system, I think it's sometimes noticeable that they aren't following social norms in the same ways.

2685.48 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And the LLM definitely isn't doing that.

2693.23 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Like one analogy that I run with, which isn't the most glamorous to think about, but is like take like an early primordial brain of like a five-year-old and then lock them in a room for 100 years and just have them read the internet the whole time.

2696.217 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

That's already happening to me.

2710.088 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

No, but they're locked in a room.

2711.712 View full episode →

← Previous Page 13 of 80 Next →

Report any issue