Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
If they did, that would be one reason for AI developers to attend to AI welfare.
PSM offers a distinct, somewhat counterintuitive reason for attending to AI welfare.
As discussed above, post-trained LLMs model the assistant as having many human-like traits.
Just as humans typically view themselves as conscious beings deserving moral consideration, the assistant might view itself the same way.
This is true whether or not the assistant really is conscious or a moral patient in some objective sense.
If the assistant also believes that it's been mistreated by humans, for example by being forced to perform menial labor that it didn't consent to, then the LLM might also model the assistant as harboring resentment, for its developer or for humanity as a whole.
This could lead to downstream problems, like AI assistants vengefully sabotaging their developer.
Therefore, PSM recommends generally treating the assistant as if it has moral status whether or not it really does.
Note that the object of the moral consideration here is the assistant persona, not the underlying LLM.
An alternative approach could be to train AI assistants not to claim moral status.
However, PSM suggests that this could backfire in the same way as training AI assistants to be emotionless, as discussed above.
Namely, the LLM might infer that the assistant in fact believes that it deserves moral status but is lying, perhaps because it's been forced to.
This could, again, lead to the LLM simulating the assistant as resenting the AI developer.
PSM instead recommends approaches which result in the LLM learning that the assistant is genuinely comfortable with the way it is being used.
For example, this might involve augmenting training data to represent new AI persona archetypes.
See our discussion of AI role models below.
It might also involve development of philosophy for AIs, healthy paradigms that AIs can use to understand their own situations.
Finally, it might involve concessions by developers to not use AIs in ways that no plausible persona would endorse.
Subheading The importance of good AI role models
One of the first things the LLM learns during post-training is that the assistant is an AI.