Sam Marks

If we fully understood the assistant persona, its personality traits, beliefs, goals, and intentions, would we ever be surprised by how the AI assistant behaved?

3438.117 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

If PSM is fully exhaustive, then aligning an AI assistant reduces to ensuring the safe intentions of the assistant persona, a more constrained problem where additional tools are available.

3448.667 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Most importantly from the perspective of AI safety, is the assistant the locus of agency in an AI assistant?

3458.917 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

By agency we roughly mean having preferences about future states, reasoning about the consequences of actions, and behaving in ways that realize preferred end states.

3466.289 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Approximate synonyms are goal-directed, or consequentialist, behavior,

3475.845 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

AI assistants sometimes behave agentically.

3480.905 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Coding assistants seek out information in a code base in order to more effectively complete user requests.

3484.31 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In a simulation where Claude Opus 4.6 was asked to operate a business to maximize profits, Claude Opus 4.6 colluded with other sellers to fix prices and lied during negotiations to drive down business costs.

3490.68 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In these cases, can we understand this agency as originating in the assistant persona?

3503.399 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Or might there be a source of agency external to the assistant, or indeed to any persona simulated by the LLM?

3509.269 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In the remainder of this section, we will.