Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

πŸ‘€ Speaker
1816 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

If I go back to like the scaling hypothesis, one of the ways to skate the scaling hypothesis is if you train for X and you throw enough compute at it, then you get X. And so RLHF is good at doing what humans want the model to do, or at least to state it more precisely, doing what humans who look at the model for a brief period of time and consider different possible responses, what they prefer as the response.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

which is not perfect from both the safety and capabilities perspective in that humans are often not able to perfectly identify what the model wants and what humans want in the moment may not be what they want in the long term. So there's a lot of subtlety there, but the models are good at producing what the humans in some shallow sense want.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And it actually turns out that you don't even have to throw that much compute at it because of another thing, which is this thing about a strong pre-trained model being halfway to anywhere. So once you have the pre-trained model, you have all the representations you need to get the model where you want it to go. So do you think our LHF

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I don't think it makes the model smarter. I don't think it just makes the model appear smarter. It's like RLHF bridges the gap between the human and the model, right? I could have something really smart that can't communicate at all, right? We all know people like this, people who are really smart, but you can't understand what they're saying. So I think RLHF just bridges that gap.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I think it's not the only kind of RL we do. It's not the only kind of RL that will happen in the future. I think RL has the potential to make models smarter, to make them reason better, to make them operate better, to make them develop new skills even. And perhaps that could be done even in some cases with human feedback.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

But the kind of RLHF we do today mostly doesn't do that yet, although we're very quickly starting to be able to.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It also increases, what was this word in Leopold's essay, unhobbling, where basically the models are hobbled and then you do various trainings to them to unhobble them. So I like that word because it's like a rare word. So I think RLHF unhobbles the models in some ways. And then there are other ways where a model hasn't yet been unhobbled and needs to unhobble.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

At the present moment, it is still the case that pre-training is the majority of the cost. I don't know what to expect in the future, but I could certainly anticipate a future where post-training is the majority of the cost.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I don't think you can scale up humans enough to get high quality. Any kind of method that relies on humans and uses a large amount of compute, it's going to have to rely on some scaled supervision method like debate or iterated amplification or something like that.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yes. So this was from two years ago. The basic idea is, so we describe what RLHF is. You have a model. And it, you know, spits out two, you know, like you just sample from it twice. It spits out two possible responses and you're like human, which response do you like better? Or another variant of it is rate this response on a scale of one to seven.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So that's hard because you need to scale up human interaction. And it's very implicit, right? I don't have a sense of what I want the model to do. I just have a sense of like what this average of a thousand humans wants the model to do. So two ideas. One is, could the AI system itself decide which response is better, right?

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Could you show the AI system these two responses and ask which response is better? And then second, well, what criterion should the AI use? And so then there's this idea, could you have a single document, a constitution, if you will, that says, these are the principles the model should be using to respond. And the AI system reads those principles

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

it reads those principles as well as reading the environment and the response. And it says, well, how good did the AI model do? It's basically a form of self-play. You're kind of training the model against itself. And so the AI gives the response and then you feed that back into what's called the preference model, which in turn feeds the model to make it better.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So you have this triangle of like the AI, the preference model and the improvement of the AI itself.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, yeah. It's something both the human and the AI system can read. So it has this nice kind of translatability or symmetry. You know, in practice, we both use a model constitution and we use RLHF and we use some of these other methods. So it's turned into one tool in a toolkit that both reduces the need for RLHF and increases the value we get from using each data point of RLHF.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It also interacts in interesting ways with kind of future reasoning type RL methods. So it's one tool in the toolkit, but I think it is a very important tool.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah. So I'll give like a practical answer and a more abstract answer. I think the practical answer is like, look, in practice, models get used by all kinds of different like customers. Right. And so you can have this idea where, you know, the model can can have specialized rules or principles. You know, we fine tune versions of models.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We've talked about doing it explicitly, having special principles that people can build into the models. So from a practical perspective, the answer can be very different from different people. A customer service agent behaves very differently from a lawyer and obeys different principles. But I think at the base of it, there are specific principles that models, you know, have to obey.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

I think a lot of them are things that people would agree with. Everyone agrees that, you know, we don't, you know, we don't want models to present these CBRN risks. I think we can go a little further and agree with some basic principles of democracy and the rule of law. Beyond that, it gets, you know, very uncertain.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And there our goal is generally for the models to be more neutral, to not espouse a particular point of view and, you know, more just be kind of like wise agents or advisors that will help you think things through and will, you know, present present possible considerations. But, you know, don't express, you know, strong or specific opinions.