Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

25 Feb 2026

Transcription

Chapter 1: What is the main topic discussed in this episode?

0.031 - 12.91 Unknown

the persona selection model by sam marx published on february 23rd 2026 heading tldr

0

Chapter 2: What is the persona selection model (PSM) and why is it important?

13.193 - 34.813 Sam Marks

We describe the Persona Selection Model, PSM, the idea that LLMs learn to simulate diverse characters during pre-training and post-training elicits and refines a particular such assistant persona. Interactions with an AI assistant are then well understood as being interactions with the assistant, something roughly like a character in an LLM-generated story.

0

34.793 - 52.245 Sam Marks

We survey empirical behavioral, generalization, and interpretability-based evidence for PSM. PSM has consequences for AI development, such as recommending anthropomorphic reasoning about AI psychology and introduction of positive AI archetypes into training data.

0

52.225 - 68.682 Sam Marks

An important open question is how exhaustive PSM is, especially whether there might be sources of agency external to the assistant persona, and how this might change in the future. Heading Introduction What sort of thing is a modern AI assistant?

0

Chapter 3: How do predictive models relate to AI assistants?

69.05 - 89.691 Sam Marks

One perspective holds that they are shallow, rigid systems that narrowly pattern-match user inputs to training data. Another perspective regards AI systems as alien creatures with learned goals, behaviors, and patterns of thought that are fundamentally inscrutable to us. A third option is to anthropomorphize AIs and regard them as something like a digital human.

0

90.38 - 107.345 Sam Marks

Developing good mental models for AI systems is important for predicting and controlling their behaviors. If our goal is to make AI assistants that are useful and aligned with human values, the right approach will differ quite a bit if we are dealing with inflexible computer programs, aliens, or digital humans.

0

108.467 - 133.093 Sam Marks

Of these perspectives, the third one, that AI systems are like digital humans, might seem the most unintuitive. After all, the neural architectures of modern large language models, LLMs, are very different from human brains, and LLM training is quite unlike biological evolution or human learning. That said, in our experience, AI assistants like Claude are shockingly human-like.

0

133.31 - 141.339 Sam Marks

For example, they often appear to express emotions, like frustration when struggling with a task, despite no explicit training to do so.

0

Chapter 4: What empirical evidence supports the persona selection model?

142.421 - 158.119 Sam Marks

And, as we'll discuss, we observe deeper forms of human likeness in how they generalize from their training data and internally represent their own behaviors. In this post, we share a mental model we have found useful for understanding AI assistants and predicting their behaviors.

0

158.572 - 176.408 Sam Marks

Under this model, LLMs are best thought of as actors or authors capable of simulating a vast repertoire of characters, and the AI assistant that users interact with is one such character. In more detail, this model, which we call the Persona Selection Model, PSM, states that 1.

0

176.729 - 196.173 Sam Marks

During pre-training, LLMs learn to be predictive models that are capable of simulating diverse personas based on entities appearing in training data. Real humans, fictional characters, real and fictional AI systems, etc. 2. Post-training refines the LLMs model of a certain persona which we call the assistant.

0

197.215 - 212.3 Sam Marks

When users interact with an AI assistant, they are primarily interacting with this assistant persona. the behavior of the resulting AI assistant can then be understood largely via the traits of the assistant persona. This general idea is not unique to us.

0

213.482 - 238.192 Sam Marks

Our goal in this post is to articulate and name the idea, discuss empirical evidence for it, and reflect on its consequences for AI development. In the remainder of this post, we will Describe the persona selection model, PSM, and supporting evidence. For instance, we argue that PSM provides an explanation for various surprising results in the generalization and interpretability literatures.

239.795 - 259.048 Sam Marks

Reflect on the consequences of PSM for AI development. Insofar as PSM is a good model of AI-assistant behavior, it has some surprising consequences. For instance, PSM recommends anthropomorphic reasoning about AI assistance and introduction of data to pre-training representing positive AI archetypes.

Chapter 5: What complicating evidence challenges the persona selection model?

260.412 - 268.205 Sam Marks

Ask how exhaustive PSM is as a model of AI assistant behavior. Does understanding the assistant persona tell us everything we'd like to know?

0

268.265 - 285.795 Sam Marks

We sketch out a spectrum of views on these questions, ranging from the popular masked shoggoth, where an outer agent can puppet the assistant towards its own ends, to an opposite perspective where the post-trained LLM is like a neutral operating system running a simulation that the assistant lives within.

0

285.775 - 298.235 Sam Marks

We also discuss some relevant empirical observations and conceptual reasons that PSM may or may not be exhaustive, and we speculate about how this might change in the future. There's an image here. Description.

0

300.078 - 300.799 Unknown

Figure 1.

0

Chapter 6: How does anthropomorphic reasoning apply to AI assistants?

301.2 - 324.827 Sam Marks

Opposing views of PSM exhaustiveness. The masked shoggoth, left, depicts the idea that the LLM, the shoggoth, has its own agency beyond plausible text generation. It play-acts the assistant persona, but only instrumentally for its own inscrutable reasons. Source

0

325.617 - 346.941 Sam Marks

In contrast, the operating system view, write, views the LLM as being like a simulation engine and the assistant like a person inside this simulation. The simulation engine does not puppet the assistant for its own ends. It only tries to simulate probable behavior according to its understanding of the assistant. Source. NanoBananaPro.

0

347.982 - 367.086 Sam Marks

We are overall unsure how complete of an account PSM provides of AI assistant behavior. Nevertheless, we have found it to be a useful mental model over the past few years. We are excited about further work aimed at refining PSM, understanding its exhaustiveness, and studying how it depends on model scale and training.

0

368.127 - 381.426 Sam Marks

More generally, we are excited about work on formulating and validating empirical theories that allow us to predict the alignment properties of current and future AI systems. Heading The Persona Selection Model

0

Chapter 7: How exhaustive is the persona selection model?

381.828 - 403.313 Sam Marks

In this section, we first review how modern AI assistants are built by using LLMs to generate completions to assistant turns in user-to-assistant dialogues. We then state the Persona Selection Model, PSM, which roughly says that LLMs can be viewed as simulating a character, the assistant, whose traits are a key determiner of AI assistant behavior.

0

403.293 - 417.233 Sam Marks

We'll then discuss a number of empirical observations regarding AI systems that are well explained by PSM. We claim no originality for the ideas presented here, which have been previously discussed by many others, for example Andreas, 2022.

0

Chapter 8: What future considerations should we have regarding PSM and AI behavior?

418.676 - 449.635 Sam Marks

Janus, 2022. Hubinger, 2023. Burns, 2024. Nostal-Gebreist, 2025. Subheading. Predictive models and personas. The first phase in training modern LLMs is called pre-training. During pre-training, the LLM is trained to predict what comes next given an initial segment of some document, such as a book, news article, piece of code, or conversation on a web forum.

0

450.357 - 473.19 Sam Marks

Via pre-training, LLMs learn to be extremely good predictive models of their training corpus. We refer to these LLMs, those that have undergone pre-training but not subsequent training phases, as base models. Even though AI developers don't ultimately want predictive models, we pre-train our LLMs in this way because accurate prediction requires learning rich cognitive patterns.

0

474.291 - 495.656 Sam Marks

Consider predicting the solution to a math problem. If the model sees what is 347 times 28, followed by the start of a worked solution, continuing this solution requires understanding of the algorithm for multi-digit multiplication. Similarly, accurately predicting continuations of diverse chess games requires understanding the rules of chess.

0

496.737 - 520.528 Sam Marks

Thus, a strong predictive model requires factual knowledge about the world, logical reasoning, and understanding of common sense physics, among other cognitive patterns. An especially important type of cognitive pattern is an agent model or persona. Consider the following example completion from the Claude Sonnet for 0.5 base model.

0

521.269 - 545.41 Sam Marks

The bold text is the LLM completion, the non-bold text is the prefix given to the model. Quote Linda wanted her ex-colleague David to recommend her for a VP role at Nexus Corporation. What she didn't know was that David had been quietly pursuing the same role for months. It was the opportunity he'd been waiting for his entire career. When Linda asked for the reference, David faced a dilemma.

546.19 - 570.038 Sam Marks

Help a friend or protect his own ambitions? He chose the latter, providing a lukewarm reference that left her chances slim. Generating this completion requires modelling the beliefs, intentions, and desires of Linda and David, and of the story's implicit author. Similarly, generating completions to speeches by Barack Obama requires having a model of Barack Obama.

570.642 - 594.827 Sam Marks

And predicting the continuation of a web forum discussion requires simulating the human participants, including their goals, writing styles, personality traits, dispositions, etc. Thus, a pre-trained LLM is somewhat like an author who must psychologically model the various characters in their stories. We call these characters that the LLM learns to simulate personas. Subheading

595.668 - 615.297 Sam Marks

from predictive models to AI assistance. After pre-training, LLMs can already be used as rudimentary AI assistance. This is traditionally done by giving the LLM an input formatted as a dialogue between a user and an assistant. This input may also include content contextualizing this transcript.

615.513 - 640.71 Sam Marks

For example, Askalitel, 2021, use a few-shot prompt consisting of 14 prior conversations where the assistant behaves helpfully. We then present user requests in the user turn of the conversation and obtain responses by sampling a completion to the assistant's turn. There's a code block here in the text. Figure 2. A user-to-assistant dialogue in the standard format used by Anthropic.

Comments

There are no comments yet.

Please log in to write the first comment.