Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
Notably, the LLMs that power these rudimentary AI assistants still fundamentally function as predictive models.
We have simply conditioned, in the sense of probability distributions, the predictive model such that the most probable continuations correspond to the sorts of helpful responses we prefer.
Instead of purely relying on prompting-based approaches for producing AI assistants, AI developers like Anthropic additionally fine-tune LLMs to better act as the kinds of AI assistants we want them to be.
During a training phase called post-training, we provide inputs consisting of user-assistant dialogues.
We then use optimization to adjust the LLMs' parameters so that the assistants' responses better align with our preferences.
For instance, we reinforce responses that are helpful, accurate, and thoughtful, while down-weighting inaccurate or harmful responses.
Terminological note
Throughout this post, we will distinguish between the assistant, the character appearing in user-to-assistant dialogues whose responses the model is predicting, and AI assistants, the overall systems that result from deploying LLMs in this way.
AI assistants are implemented by using an LLM to generate completions to assistant turns in dialogues.
PSM is centrally about how the LLM learns to model the assistant.
Note that, as a character in a story generated by the LLM, the assistant is a very different type of entity than the LLM itself.
In particular, while it may be fraught to anthropomorphize an LLM, for example attribute beliefs, goals, or values to it, it is sensible to anthropomorphize characters in an LLM-generated story.
For example, it is sensible to discuss the beliefs, goals, and values of David and Linda in the example above.
We will therefore freely anthropomorphize the assistant in our discussion below.
Subheading Statement of the Persona Selection Model Above, we discussed how pre-trained LLMs, functioning purely as predictive models, can be used as rudimentary AI assistants by conditioning them to enact a helpful assistant persona.
PSM states that post-training does not change this overall picture.
Informally, PSM views post-training as refining the LLM's model of the assistant persona.
Its personality traits, sense of humor, preferences, beliefs, goals, etc.
These characteristics of the assistant are then a key determiner of AI assistant behavior.
More formally, PSM states that