Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Notably, the LLMs that power these rudimentary AI assistants still fundamentally function as predictive models.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We have simply conditioned, in the sense of probability distributions, the predictive model such that the most probable continuations correspond to the sorts of helpful responses we prefer.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Instead of purely relying on prompting-based approaches for producing AI assistants, AI developers like Anthropic additionally fine-tune LLMs to better act as the kinds of AI assistants we want them to be.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

During a training phase called post-training, we provide inputs consisting of user-assistant dialogues.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We then use optimization to adjust the LLMs' parameters so that the assistants' responses better align with our preferences.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For instance, we reinforce responses that are helpful, accurate, and thoughtful, while down-weighting inaccurate or harmful responses.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Terminological note

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Throughout this post, we will distinguish between the assistant, the character appearing in user-to-assistant dialogues whose responses the model is predicting, and AI assistants, the overall systems that result from deploying LLMs in this way.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

AI assistants are implemented by using an LLM to generate completions to assistant turns in dialogues.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM is centrally about how the LLM learns to model the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Note that, as a character in a story generated by the LLM, the assistant is a very different type of entity than the LLM itself.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In particular, while it may be fraught to anthropomorphize an LLM, for example attribute beliefs, goals, or values to it, it is sensible to anthropomorphize characters in an LLM-generated story.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, it is sensible to discuss the beliefs, goals, and values of David and Linda in the example above.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We will therefore freely anthropomorphize the assistant in our discussion below.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading Statement of the Persona Selection Model Above, we discussed how pre-trained LLMs, functioning purely as predictive models, can be used as rudimentary AI assistants by conditioning them to enact a helpful assistant persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM states that post-training does not change this overall picture.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Informally, PSM views post-training as refining the LLM's model of the assistant persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Its personality traits, sense of humor, preferences, beliefs, goals, etc.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

These characteristics of the assistant are then a key determiner of AI assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

More formally, PSM states that