Sam Marks

"The persona selection model" by Sam Marks

Notably, the LLMs that power these rudimentary AI assistants still fundamentally function as predictive models.

"The persona selection model" by Sam Marks

We have simply conditioned, in the sense of probability distributions, the predictive model such that the most probable continuations correspond to the sorts of helpful responses we prefer.

658.473 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Instead of purely relying on prompting-based approaches for producing AI assistants, AI developers like Anthropic additionally fine-tune LLMs to better act as the kinds of AI assistants we want them to be.

668.787 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

During a training phase called post-training, we provide inputs consisting of user-assistant dialogues.

680.445 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We then use optimization to adjust the LLMs' parameters so that the assistants' responses better align with our preferences.

687.056 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For instance, we reinforce responses that are helpful, accurate, and thoughtful, while down-weighting inaccurate or harmful responses.

694.167 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Terminological note

702.805 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Throughout this post, we will distinguish between the assistant, the character appearing in user-to-assistant dialogues whose responses the model is predicting, and AI assistants, the overall systems that result from deploying LLMs in this way.

704.933 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

AI assistants are implemented by using an LLM to generate completions to assistant turns in dialogues.

719.153 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM is centrally about how the LLM learns to model the assistant.

725.542 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Note that, as a character in a story generated by the LLM, the assistant is a very different type of entity than the LLM itself.

729.908 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In particular, while it may be fraught to anthropomorphize an LLM, for example attribute beliefs, goals, or values to it, it is sensible to anthropomorphize characters in an LLM-generated story.

738.423 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, it is sensible to discuss the beliefs, goals, and values of David and Linda in the example above.

751.306 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We will therefore freely anthropomorphize the assistant in our discussion below.

758.713 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading Statement of the Persona Selection Model Above, we discussed how pre-trained LLMs, functioning purely as predictive models, can be used as rudimentary AI assistants by conditioning them to enact a helpful assistant persona.

763.858 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM states that post-training does not change this overall picture.

779.487 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Informally, PSM views post-training as refining the LLM's model of the assistant persona.

784.073 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Its personality traits, sense of humor, preferences, beliefs, goals, etc.