Sam Marks

"The persona selection model" by Sam Marks

We describe the Persona Selection Model, PSM, the idea that LLMs learn to simulate diverse characters during pre-training and post-training elicits and refines a particular such assistant persona.

13.193 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Interactions with an AI assistant are then well understood as being interactions with the assistant, something roughly like a character in an LLM-generated story.

26.057 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We survey empirical behavioral, generalization, and interpretability-based evidence for PSM.

34.793 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM has consequences for AI development, such as recommending anthropomorphic reasoning about AI psychology and introduction of positive AI archetypes into training data.

41.706 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

An important open question is how exhaustive PSM is, especially whether there might be sources of agency external to the assistant persona, and how this might change in the future.

52.225 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Heading Introduction What sort of thing is a modern AI assistant?

63.029 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

One perspective holds that they are shallow, rigid systems that narrowly pattern-match user inputs to training data.

69.05 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Another perspective regards AI systems as alien creatures with learned goals, behaviors, and patterns of thought that are fundamentally inscrutable to us.

75.864 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

A third option is to anthropomorphize AIs and regard them as something like a digital human.

84.601 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Developing good mental models for AI systems is important for predicting and controlling their behaviors.

90.38 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

If our goal is to make AI assistants that are useful and aligned with human values, the right approach will differ quite a bit if we are dealing with inflexible computer programs, aliens, or digital humans.

96.569 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Of these perspectives, the third one, that AI systems are like digital humans, might seem the most unintuitive.

108.467 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

After all, the neural architectures of modern large language models, LLMs, are very different from human brains, and LLM training is quite unlike biological evolution or human learning.

115.878 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That said, in our experience, AI assistants like Claude are shockingly human-like.

127.702 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, they often appear to express emotions, like frustration when struggling with a task, despite no explicit training to do so.

133.31 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

And, as we'll discuss, we observe deeper forms of human likeness in how they generalize from their training data and internally represent their own behaviors.

142.421 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In this post, we share a mental model we have found useful for understanding AI assistants and predicting their behaviors.

151.491 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Under this model, LLMs are best thought of as actors or authors capable of simulating a vast repertoire of characters, and the AI assistant that users interact with is one such character.

158.572 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In more detail, this model, which we call the Persona Selection Model, PSM, states that

169.758 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

1.

176.088 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment