Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Pre-training teaches an LLM a distribution over personas.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Implicit in this distribution are various hypotheses about the assistant persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Is it helpful?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Rude?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Manipulative?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Post-training can be viewed as updating this distribution using training episodes as evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

When training an AI assistant on an input XX, output YY, a pair, hypotheses that predict the assistant would respond with YY to XX are upweighted.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Hypotheses that predict the opposite are downweighted.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This results in a posterior distribution over assistant personas.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Because this is still a distribution, stochasticity and contextual information provided at runtime still affect the assistant persona simulated during a given rollout.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Assistant persona behavior is a key determiner of AI assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

To predict how an AI assistant will behave, PSM recommends asking what would the assistant do?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

According to the beliefs of the post-trained LLM simulating the assistant.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We clarify some claims which PSM does not make.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's a list of bullet points here.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM does not assert that understanding the assistant persona gives an exhaustive account of AI assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We view the exhaustiveness of PSM as being an important open question, which we discuss at length below.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM does not rule out learning of new capabilities during post-training.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, no persona learned during pre-training knows how to use anthropic syntax for tool calling.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That capability is learned during post-training.