Sam Marks

Because this is still a distribution, stochasticity and contextual information provided at runtime still affect the assistant persona simulated during a given rollout.

844 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Assistant persona behavior is a key determiner of AI assistant behavior.

854.33 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

To predict how an AI assistant will behave, PSM recommends asking what would the assistant do?

859.134 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

According to the beliefs of the post-trained LLM simulating the assistant.

865.28 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We clarify some claims which PSM does not make.

869.704 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

There's a list of bullet points here.

873.515 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM does not assert that understanding the assistant persona gives an exhaustive account of AI assistant behavior.

875.657 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We view the exhaustiveness of PSM as being an important open question, which we discuss at length below.

882.403 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

PSM does not rule out learning of new capabilities during post-training.

889.229 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, no persona learned during pre-training knows how to use anthropic syntax for tool calling.

894.114 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That capability is learned during post-training.

900.579 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment