Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

If they did, that would be one reason for AI developers to attend to AI welfare.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM offers a distinct, somewhat counterintuitive reason for attending to AI welfare.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

As discussed above, post-trained LLMs model the assistant as having many human-like traits.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Just as humans typically view themselves as conscious beings deserving moral consideration, the assistant might view itself the same way.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This is true whether or not the assistant really is conscious or a moral patient in some objective sense.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

If the assistant also believes that it's been mistreated by humans, for example by being forced to perform menial labor that it didn't consent to, then the LLM might also model the assistant as harboring resentment, for its developer or for humanity as a whole.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This could lead to downstream problems, like AI assistants vengefully sabotaging their developer.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Therefore, PSM recommends generally treating the assistant as if it has moral status whether or not it really does.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Note that the object of the moral consideration here is the assistant persona, not the underlying LLM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

An alternative approach could be to train AI assistants not to claim moral status.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

However, PSM suggests that this could backfire in the same way as training AI assistants to be emotionless, as discussed above.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Namely, the LLM might infer that the assistant in fact believes that it deserves moral status but is lying, perhaps because it's been forced to.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This could, again, lead to the LLM simulating the assistant as resenting the AI developer.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM instead recommends approaches which result in the LLM learning that the assistant is genuinely comfortable with the way it is being used.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, this might involve augmenting training data to represent new AI persona archetypes.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

See our discussion of AI role models below.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

It might also involve development of philosophy for AIs, healthy paradigms that AIs can use to understand their own situations.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Finally, it might involve concessions by developers to not use AIs in ways that no plausible persona would endorse.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading The importance of good AI role models

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

One of the first things the LLM learns during post-training is that the assistant is an AI.