Sam Marks

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Post-training adapts and modifies personas the same way evolution adapted and modified the forelimb skeleton.

4652.553 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Source.

4659.204 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Altogether, these considerations make it seem likely that deep learning would preferentially fit the post-training objective by reproposing existing persona simulation capabilities to simulate an assistant persona, rather than learn new agentic capabilities from scratch.

4660.687 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

4675.792 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

How might these considerations change?

4677.089 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In the future, we expect that the scale of LLM training will be larger, including pre- and post-training.

4680.212 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

How will this interact with the considerations above?

4687.16 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Insofar as post-training can ever teach new behaviours and capabilities from scratch, and it likely can, we should expect that massively scaling up post-training will provide opportunities to implement non-persona agency and will generally make post-trained models less similar to their pre-trained base.

4690.684 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Thus, we expect the post-training as elicitation consideration may weaken over time.

4707.042 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Regarding that inductive bias towards reuse of persona modeling consideration, the situation is less clear.

4712.791 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

On this view, we might expect AI assistants to become less persona-like once their post-training objectives are no longer as easily fit by adapting personas.

4719.923 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

It is not clear what such a post-training objective would look like.

4729.137 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Plausibly this could occur if we train AIs to operate in extremely novel settings, for example handling exotic modalities that humans lack, for example industrial sensors or genomic data, or directly operating physical infrastructure in hundreds of geographically dispersed factories.

4733.232 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

However, this is complicated by the way information about previous AI generations enters the pre-training corpus.

4749.365 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This may have an effect of iteratively building a concept of an AI assistant that future AI assistants can continue to use as scaffolding.

4756.501 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, information about previous AI chatbots appears to influence the personas enacted by current AI assistants.

4764.147 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Overall, we are uncertain how the exhaustiveness of PSM will change over time.

4771.795 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We have not intuitively found that during 2025, a year when LLM post-training scaled up substantially, PSM has become a weaker predictor of AI assistant behavior.

4777.181 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We therefore find it plausible that PSM could continue to be as useful a model of AI-assistant behavior as it has so far.

4788.16 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That said, we also find it plausible that PSM could become substantially less useful in the future.

4795.289 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment