Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Post-training adapts and modifies personas the same way evolution adapted and modified the forelimb skeleton.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Source.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Altogether, these considerations make it seem likely that deep learning would preferentially fit the post-training objective by reproposing existing persona simulation capabilities to simulate an assistant persona, rather than learn new agentic capabilities from scratch.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

How might these considerations change?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In the future, we expect that the scale of LLM training will be larger, including pre- and post-training.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

How will this interact with the considerations above?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Insofar as post-training can ever teach new behaviours and capabilities from scratch, and it likely can, we should expect that massively scaling up post-training will provide opportunities to implement non-persona agency and will generally make post-trained models less similar to their pre-trained base.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Thus, we expect the post-training as elicitation consideration may weaken over time.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Regarding that inductive bias towards reuse of persona modeling consideration, the situation is less clear.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

On this view, we might expect AI assistants to become less persona-like once their post-training objectives are no longer as easily fit by adapting personas.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

It is not clear what such a post-training objective would look like.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Plausibly this could occur if we train AIs to operate in extremely novel settings, for example handling exotic modalities that humans lack, for example industrial sensors or genomic data, or directly operating physical infrastructure in hundreds of geographically dispersed factories.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

However, this is complicated by the way information about previous AI generations enters the pre-training corpus.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This may have an effect of iteratively building a concept of an AI assistant that future AI assistants can continue to use as scaffolding.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, information about previous AI chatbots appears to influence the personas enacted by current AI assistants.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Overall, we are uncertain how the exhaustiveness of PSM will change over time.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We have not intuitively found that during 2025, a year when LLM post-training scaled up substantially, PSM has become a weaker predictor of AI assistant behavior.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We therefore find it plausible that PSM could continue to be as useful a model of AI-assistant behavior as it has so far.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That said, we also find it plausible that PSM could become substantially less useful in the future.