Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

How entangled are personas?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Do they share knowledge?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Propensities?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Is it possible to control their degree of entanglement?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Understanding the mechanistic basis of personas.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Can we understand the space of personas an LLM can model?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Can we understand the persona that an LLM is actively enacting?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That's the end of the list.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

More broadly, we are excited about the project of developing and validating theories of AI systems, mental models that allow us to predict how AI systems will behave in novel situations and how their behavior will change as they are trained differently.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

PSM is one such theory.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We hope that by naming and articulating it, we can encourage further work on refining it, stress testing it, and, where it falls short, developing better alternatives.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Heading Acknowledgements Many people contributed valuable ideas and discussion to this post.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Fabian Roger suggested many items of evidence, especially that in the section on complicating evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Joshua Batson sketched out the example of non-persona agency arising from a lightweight router mechanism.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Jared Kaplan suggested writing this post and provided useful discussion and feedback.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Alex Cloud, Evan Hubinger, and many other Anthropic employees who commented on an initial draft and provided helpful discussion.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Rowan Wang, Tim Bellinax, and Carl de Torres designed figures.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The images in our discussion of PSM exhaustiveness were generated by Nano Banana Pro.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Heading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Appendix A. Breaking Character.