Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
Opposing views of PSM exhaustiveness.
The masked shoggoth, left, depicts the idea that the LLM, the shoggoth, has its own agency beyond plausible text generation.
It play-acts the assistant persona, but only instrumentally for its own inscrutable reasons.
Source
In contrast, the operating system view, write, views the LLM as being like a simulation engine and the assistant like a person inside this simulation.
The simulation engine does not puppet the assistant for its own ends.
It only tries to simulate probable behavior according to its understanding of the assistant.
Source.
NanoBananaPro.
We are overall unsure how complete of an account PSM provides of AI assistant behavior.
Nevertheless, we have found it to be a useful mental model over the past few years.
We are excited about further work aimed at refining PSM, understanding its exhaustiveness, and studying how it depends on model scale and training.
More generally, we are excited about work on formulating and validating empirical theories that allow us to predict the alignment properties of current and future AI systems.
Heading The Persona Selection Model
In this section, we first review how modern AI assistants are built by using LLMs to generate completions to assistant turns in user-to-assistant dialogues.
We then state the Persona Selection Model, PSM, which roughly says that LLMs can be viewed as simulating a character, the assistant, whose traits are a key determiner of AI assistant behavior.
We'll then discuss a number of empirical observations regarding AI systems that are well explained by PSM.
We claim no originality for the ideas presented here, which have been previously discussed by many others, for example Andreas, 2022.
Janus, 2022.
Hubinger, 2023.