Trenton Bricken

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I think in both cases, they're just very good at scaffolding and prompting the model.

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

I mean, even with the viral ChatGPT geoguessr capabilities, where it's just insanely good at spotting, like, what beach you were on from a photo.

504.08 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Kelsey Piper, who I think made this viral...

514.255 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Their prompt is so sophisticated.

517.199 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's really long, and it encourages you to think of five different hypotheses and assign probabilities to them and reason through the different aspects of the image that matter.

520.444 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And I haven't A-B tested it, but I think unless you really encourage the model to be this thoughtful, you wouldn't get the level of performance that you see with that ability.

530.258 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, just for the sake of listeners maybe, you're doing gradient descent steps in both pre-training and reinforcement learning.

742.97 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's just the signal's different.

751.498 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Typically in reinforcement learning, your reward is sparser.

753.02 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So you take multiple turns.

756.203 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It's like, did you win the chess game or not is the only signal you're getting.

757.744 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And often you can't compute gradients through discrete actions.

761.288 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so you end up losing a lot of gradient signal.

766.072 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so you can presume that pre-training is more efficient, but there's no reason why you couldn't learn new abilities in reinforcement learning.

769.075 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

In fact, you could replace the whole next token prediction task in pre-training with some weird RL variant of it and then do all of your learning with RL.

777.91 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Yeah, at the end of the day, just signal and then correcting to it.

787.908 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Totally.

791.131 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then going back to the paper you mentioned, aside from the caveats that Sholto brings up, which I think is the first order, most important, I think zeroing in on the probability space of meaningful actions comes back to the nines of reliability.

791.552 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And classically, if you give monkeys a typewriter, eventually they'll write Shakespeare, right?

804.665 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And so the action space for any of these real world tasks that we care about is so large

808.949 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment