Sam Marks

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

If they did, that would be one reason for AI developers to attend to AI welfare.

2970.368 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM offers a distinct, somewhat counterintuitive reason for attending to AI welfare.

2975.615 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

As discussed above, post-trained LLMs model the assistant as having many human-like traits.

2981.363 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Just as humans typically view themselves as conscious beings deserving moral consideration, the assistant might view itself the same way.

2987.511 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This is true whether or not the assistant really is conscious or a moral patient in some objective sense.

2995.141 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

If the assistant also believes that it's been mistreated by humans, for example by being forced to perform menial labor that it didn't consent to, then the LLM might also model the assistant as harboring resentment, for its developer or for humanity as a whole.

3001.451 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This could lead to downstream problems, like AI assistants vengefully sabotaging their developer.

3015.552 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Therefore, PSM recommends generally treating the assistant as if it has moral status whether or not it really does.

3021.331 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Note that the object of the moral consideration here is the assistant persona, not the underlying LLM.

3028.387 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

An alternative approach could be to train AI assistants not to claim moral status.

3034.641 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

However, PSM suggests that this could backfire in the same way as training AI assistants to be emotionless, as discussed above.

3039.657 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Namely, the LLM might infer that the assistant in fact believes that it deserves moral status but is lying, perhaps because it's been forced to.

3047.936 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This could, again, lead to the LLM simulating the assistant as resenting the AI developer.

3056.161 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM instead recommends approaches which result in the LLM learning that the assistant is genuinely comfortable with the way it is being used.

3062.211 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, this might involve augmenting training data to represent new AI persona archetypes.

3070.224 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

See our discussion of AI role models below.

3076.695 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

It might also involve development of philosophy for AIs, healthy paradigms that AIs can use to understand their own situations.

3079.988 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Finally, it might involve concessions by developers to not use AIs in ways that no plausible persona would endorse.

3088.484 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading The importance of good AI role models

3096.019 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

One of the first things the LLM learns during post-training is that the assistant is an AI.

3100.357 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment