Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Trenton Bricken

👤 Person
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think in both cases, they're just very good at scaffolding and prompting the model.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, even with the viral ChatGPT geoguessr capabilities, where it's just insanely good at spotting, like, what beach you were on from a photo.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Kelsey Piper, who I think made this viral...

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Their prompt is so sophisticated.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's really long, and it encourages you to think of five different hypotheses and assign probabilities to them and reason through the different aspects of the image that matter.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I haven't A-B tested it, but I think unless you really encourage the model to be this thoughtful, you wouldn't get the level of performance that you see with that ability.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, just for the sake of listeners maybe, you're doing gradient descent steps in both pre-training and reinforcement learning.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's just the signal's different.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Typically in reinforcement learning, your reward is sparser.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So you take multiple turns.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's like, did you win the chess game or not is the only signal you're getting.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And often you can't compute gradients through discrete actions.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so you end up losing a lot of gradient signal.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so you can presume that pre-training is more efficient, but there's no reason why you couldn't learn new abilities in reinforcement learning.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

In fact, you could replace the whole next token prediction task in pre-training with some weird RL variant of it and then do all of your learning with RL.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, at the end of the day, just signal and then correcting to it.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Totally.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then going back to the paper you mentioned, aside from the caveats that Sholto brings up, which I think is the first order, most important, I think zeroing in on the probability space of meaningful actions comes back to the nines of reliability.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And classically, if you give monkeys a typewriter, eventually they'll write Shakespeare, right?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so the action space for any of these real world tasks that we care about is so large