Trenton Bricken

Like it discovered in one of the training runs, so it doesn't do this all the time, this kind of like, ooh, Claude seems to have this core notion that it will do whatever AI models do.

2182.861 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Well, it is.

2202.787 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Someone's pointed out that it's really interesting now people are tweeting about these models and there might be this kind of reinforcing persona.

2203.428 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Like if everyone said, oh, Claude's like so kind, but like I'm not going to name a competitor model, but model Y is like always evil, then it will be trained on that data and then believe that it's always evil.

2210.677 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And this could be great.

2222.452 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It could be a problem.

2224.075 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Must have been the old system prompt again.

2266.427 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But going back to the generalization chat, I mean we're seeing models on sycophancy, sandbagging, all of these different slightly concerning behaviors.

2268.689 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

They do more of it as they get smarter.

2278.587 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment