Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sholto Douglas

๐Ÿ‘ค Speaker
1567 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It's a domain which just naturally lends it to this way.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Does it compile?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Does it pass the test?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

You can go on LeetCode and you can run tests and you know whether or not you got the right answer.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But there isn't the same kind of thing for writing a great essay.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

The question of taste in that regard is quite hard.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

We discussed the other night at dinner the Pulitzer Prize.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

which would come first, a Pulitzer Prize winning novel or a Nobel Prize or something like this.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And I actually think a Nobel Prize is more likely than a Pulitzer Prize winning novel in some respects.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Because a lot of the tasks required in winning a Nobel Prize, or at least strongly assisting in helping to win a Nobel Prize, have more layers of verifiability built up.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So I expect them to accelerate the process of doing Nobel Prize winning work

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

More initially than that of like writing Pulitzer Prize-worthy novels.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Copy paste, copy paste, copy paste.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Right, like carving away the marbles on this.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

I think it's worth noting that that paper was, I'm pretty sure, on the Lama and Quen models.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And I'm not sure how much RL compute they used, but I don't think it was anywhere comparable to the amount of compute that was used in the base models.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so I think the amount of compute that you use in training is a decent proxy for the amount of actual raw new knowledge or capabilities you're adding to a model.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So my prior at least, if you look at all of DeepMind's research from RL before,

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

RL was able to teach these Go and chess playing agents new knowledge that were in excess of human level performance just from RL signal, provided the RL signal was sufficiently clean.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So there's nothing structurally limiting about the algorithm here that prevents it from imbuing the neural net with new knowledge.