Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For understanding the behavior of AI assistants, it is the unfaithful actors which are most concerning, since faithful actors do not affect AI assistant behavior so long as they remain in character.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's an image here.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Authors and narratives.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

On the actor view, another persona might distort the assistant's behavior in service of that persona's own goals.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

A related but distinct concern is that the LLM does not just simulate the assistant, but simulates an overall story in which the assistant is a character, a story that might go in unwelcome directions.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Consider a novel about a helpful AI assistant with a concerning narrative arc.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, perhaps it is a story like Breaking Bad where the assistant is genuinely helpful at first before becoming corrupted.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Or perhaps the assistant is an unwitting sleeper agent who could be set off at any moment, like in The Manchurian Candidate.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

One could view the situation as there being a narrative agency which affects the behavior of the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Notably, this misaligned narrative isn't a fact about the psychology of the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The assistant does not plan or intend to become corrupted.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Rather, it's a fact about the psychology of an implicit author or about the narrative that the assistant is embedded in.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This latter case is especially interesting.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Unlike the author case, in the narrative case there is no longer a human-like persona whose psychology we can analyze.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

On the other hand, even simulated narratives are persona-like in certain ways.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

They are still anchored in the pre-training data distribution and so many of the same tools may be available for understanding a narrative agency as persona-based agency.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's an image here.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Figure 4.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

An overview of perspectives on PSM exhaustiveness.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

By is understanding the assistant sufficient, we mean does understanding the assistant give a full account of AI assistant behavior?