Sam Marks

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Why would training the LLM on declarative statements about the assistant generalize in this way?

1332.94 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This is natural from the perspective of PSM.

1338.505 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Post-training provides evidence about the assistant's persona, but it's not the only way to provide this evidence.

1341.95 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Another way is to directly teach the LLM declarative knowledge about the assistant in the same way that it learns knowledge about the world during pre-training.

1348.42 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This evidence then affects the LLM's enactment of the assistant, just as evidence obtained during post-training does.

1356.772 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

See also our discussion below about data augmentation for good AI role models.

1363.522 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

1368.965 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Behavioral evidence.

1370.468 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Insofar as AI assistance behaviors resemble the behaviors of entities appearing in pre-training data, this constitutes evidence for PSM.

1372.674 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In contrast, when AI assistants behave in ways that are extremely different from how real humans, fictional characters, or other personas would behave, this provides evidence against PSM.

1381.132 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

It is often difficult to adjudicate whether a behavior provides evidence for PSM.

1392.763 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That said, in this section we discuss AI assistant behaviors that we think are best explained as arising from simulated personas and would be surprising otherwise.

1397.908 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Anthropomorphic self-descriptions

1407.297 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

When asked why do humans crave sugar, Claude Sonnet for Point5 responds, Quote Heading Evolutionary and biological reasons Asterisk asterisk survival advantage asterisk asterisk Our ancestors evolved to seek out sugar because Ripe fruits, natural sugars, provided quick energy in calorie-scarce environments.

1410.089 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Sweet taste signaled, safe to eat, poisonous plants are rarely sweet.

1435.823 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

The brain runs almost exclusively on glucose.

1440.889 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Asterisk asterisk rewards system asterisk asterisk.