Sholto Douglas

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And you see a similar thing with their approach to sparsity, where they're iteratively working out the best way to do this over multiple papers.

5227.045 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And the part that I like is that it's simple.

5233.639 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

A big failure mode that a lot of ML researchers have is you do these overly complicated things that don't

5239.051 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

like think hard enough about the hardware systems that you have in mind.

5245.726 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Whereas the first DeepSeq sparsity MOE solution, they designed these rack and node level load balancing losses.

5250.571 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

So you can see them being like, OK, we have to perfectly balance on this.

5263.043 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And then they actually come up with a much better solution later on where they don't have to have the auxiliary loss.

5265.966 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

where they just have these bias terms that they put in.

5272.473 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And it's cool.

5275.177 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

JONATHAN HOLMES, But balancing auxiliary losses and wing.

5279.322 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Like, you're making the model trade off this thing.

5282.106 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And with auxiliary losses, you have to control the coefficient and the weighting.

5285.27 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

The bias is cleaner in some respects.

5289.916 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

MARK MANDELMANN, Interesting.

5291.839 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

They did have to change it during training.

5294.242 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

It depends on what your architecture is.

5304.307 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

But I just thought it was cute that you can see them running up into this very hardware level

5306.031 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And they tried to go like, what do we wish we could express algorithmically?

5312.146 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

What can we express under our constraints?

5315.492 View full episode →

Dwarkesh Podcast

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

And iteratively solving to get better constraints.

5317.615 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment