Trenton Bricken

When you look at the bigger models, it just has a much crisper lookup table for how to add, like, the number 5 and 9 together and get something like 10 modulo 6, 6 modulo 10.

1575.546 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Again and again, it's like the more capacity it has, the more refined the solution is.

1586.956 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

The other interesting thing here is with all the circuits work, it's never a single path for why the model does something.

1593.066 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's always multiple paths, and some of them are deeper than others.

1598.515 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So when the model immediately sees the word bomb,

1601.801 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

There's a direct path to it refusing.

1605.367 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It goes from the word bomb.

1608.274 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

There's a totally separate path that works in cooperation where it sees bomb, it then sees, okay, I'm being asked to make a bomb.

1609.877 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Okay, this is a harmful request.

1618.137 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I'm an AI agent.

1620.903 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I've been trained to refuse this.

1622.647 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Right.

1624.731 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so like one possible narrative here is that as the model becomes smarter over the course of training, it learns to replace the like short circuit imitation C-bomb refuse with this deeper reasoning circuit.

1625.031 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment