Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

During pre-training, LLMs learn to be predictive models that are capable of simulating diverse personas based on entities appearing in training data.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Real humans, fictional characters, real and fictional AI systems, etc.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Post-training refines the LLMs model of a certain persona which we call the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

When users interact with an AI assistant, they are primarily interacting with this assistant persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

the behavior of the resulting AI assistant can then be understood largely via the traits of the assistant persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This general idea is not unique to us.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Our goal in this post is to articulate and name the idea, discuss empirical evidence for it, and reflect on its consequences for AI development.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In the remainder of this post, we will

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Describe the persona selection model, PSM, and supporting evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For instance, we argue that PSM provides an explanation for various surprising results in the generalization and interpretability literatures.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Reflect on the consequences of PSM for AI development.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Insofar as PSM is a good model of AI-assistant behavior, it has some surprising consequences.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For instance, PSM recommends anthropomorphic reasoning about AI assistance and introduction of data to pre-training representing positive AI archetypes.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Ask how exhaustive PSM is as a model of AI assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Does understanding the assistant persona tell us everything we'd like to know?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We sketch out a spectrum of views on these questions, ranging from the popular masked shoggoth, where an outer agent can puppet the assistant towards its own ends, to an opposite perspective where the post-trained LLM is like a neutral operating system running a simulation that the assistant lives within.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We also discuss some relevant empirical observations and conceptual reasons that PSM may or may not be exhaustive, and we speculate about how this might change in the future.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's an image here.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Description.