John Schulman
๐ค SpeakerAppearances Over Time
Podcast Appearances
So I guess there might be some practical questions like that that would also determine how things play out.
Maybe if you just require people to be accountable for various liability, this would also change the incentives a bit.
So if it turned out that...
AIs are better at running everything and they're also completely benevolent and we've totally solved alignment and they're better at being accountable to people than people are, then I would say maybe it's okay having the AIs run the firms, but I think that might be pretty far out.
we're more likely to be in a situation where they look better like in the short term, but they still have some problems, like the AI run entities still have some serious problems.
And it's actually like practical considerations that push you more towards having humans in the loop, at least for the near future.
If the models are being used for these higher stakes use cases, then we would have to think about RLHF in a much different way than we are right now.
So I would say we're not quite ready for that, or the current methods might not be completely sufficient, but I would say...
I would say we would need to make compromises between the needs of the different stakeholders involved.
So we have this document that we're releasing called the model spec, and it's about how we want our models to behave in the API and in ChatGBT.
And we try to talk about this issue where there are different stakeholders involved, and sometimes there are conflicts between what they might want, like the...
uh in our case we were thinking of the stakeholders as uh the user uh or the end user that means like someone sitting in front of chat gbt or or some other app um the developer so this is like someone using the api um who might be serving other end users with their app like the um
the platform, which is open AI.
We don't want the models to expose us to legal risk and so forth.
And then the rest of humanity, including people who might not be users or customers or anything.
So obviously, the user might ask,
ask the model to do something that we think is like actively harmful to other people.
And so we might have to refuse that.
By the way, this isn't the order of priority necessarily.
So this is just like, we have these four or so classes of stakeholder.