Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Trenton Bricken

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
1589 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And we fit up to, I want to say, 16,000 features, which we thought was a ton at the time.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Fast forward nine months, we go from a two-layer transformer to our Cloud 3 Sonnet frontier model at the time and fit up to 30 million features.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And this is where we start to find really interesting abstract concepts like a feature that would fire for code vulnerabilities.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And it wouldn't just fire for code vulnerabilities.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It would even fire for, like, you know that Chrome page you get if you, like...

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

it's not an HTTPS URL, and it's like warning, this site might be dangerous, like click to continue.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And like also fire for that, for example.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so it's like these much more abstract coding variables or sentiment features amongst the 30 million.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

Fast forward nine months from that and now we have circuits.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And I threw in the analogy earlier of the Ocean 11 heist team where now you're identifying individual features across the layers of the model that are all working together to perform some complicated task.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And you can get a much better idea of how it's actually doing

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

the reasoning and coming to decisions, like with the medical diagnostics.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

One example I didn't talk about before is with like how the model retrieves facts.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so you say like, what sport did Michael Jordan play?

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And not only can you see it hop from like Michael Jordan to basketball, answer basketball, but the model also has an awareness of when it doesn't know the answer to a fact.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

And so by default, it will actually say, I don't know the answer to this question.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But if it sees something that it does know the answer to, it will inhibit the I don't know circuit and then reply with the circuit that it actually has the answer to.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

So, for example, if you ask it who is Michael Batkin, which is just a made-up fictional person, it will by default just say I don't know.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

It's only with Michael Jordan or someone else that it will then inhibit the I don't know circuit.

Dwarkesh Podcast
Is RL + LLMs enough for AGI? โ€” Sholto Douglas & Trenton Bricken

But what's really interesting here and where you can start making downstream predictions or reasoning about the model is that that I don't know circuit is only on the name of the person.