Sam Marks

"The persona selection model" by Sam Marks

Because of this, deep learning likely has an inductive bias towards reusing these capabilities, rather than learning new agentic capabilities from scratch.

4512.592 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

First, observe that persona modeling is a flexible and powerful way to implement agentic behavior.

4522.245 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

During pre-training, LLMs learn to model a large and diverse space of agents who need to pursue their goals in varied circumstances.

4528.594 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Persona simulation is therefore a sort of meta-agency that can be flexibly repurposed for specific choices of goals, beliefs, and other propensities.

4536.385 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

It is therefore ripe to serve as the agentic backend of an AI assistant.

4545.882 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Second, unlike pre-training, post-training for AI assistants is narrowly focused.

4550.61 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Essentially all post-training episodes consist of user-assistant dialogues.

4555.859 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Furthermore, the behaviors we train AI assistants for are persona-consistent.

4561.252 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That is, they are the sorts of behaviors that a human-like persona from the pre-training distribution could plausibly have.

4566.243 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We don't train AI assistants to produce strange text outputs that decode into motions of robotic arms and pistons.

4573.004 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We train them to interact conversationally using natural language in the way that a helpful, knowledgeable, and ethical person would.

4579.956 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Third, deep learning likely has an inductive bias towards reuse of existing mechanisms, like persona modeling.

4587.71 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Analogously, biological evolution tends to adapt useful structures, such as fallen bones in vertebrates, when they are available, instead of independently evolving variants from scratch within the same organism.

4594.722 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This latter, independent evolution in the same organism output would be analogous to learning non-persona agency from scratch within an LLM that already had strong persona modeling capabilities.

4607.48 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Deep learning would rather just reuse and adapt the existing agentic capabilities bound up in persona models.

4618.356 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

There's an image here.

4624.912 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Figure 5.