Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Why would training the LLM on declarative statements about the assistant generalize in this way?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This is natural from the perspective of PSM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Post-training provides evidence about the assistant's persona, but it's not the only way to provide this evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Another way is to directly teach the LLM declarative knowledge about the assistant in the same way that it learns knowledge about the world during pre-training.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This evidence then affects the LLM's enactment of the assistant, just as evidence obtained during post-training does.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

See also our discussion below about data augmentation for good AI role models.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Behavioral evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Insofar as AI assistance behaviors resemble the behaviors of entities appearing in pre-training data, this constitutes evidence for PSM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In contrast, when AI assistants behave in ways that are extremely different from how real humans, fictional characters, or other personas would behave, this provides evidence against PSM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

It is often difficult to adjudicate whether a behavior provides evidence for PSM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That said, in this section we discuss AI assistant behaviors that we think are best explained as arising from simulated personas and would be surprising otherwise.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Anthropomorphic self-descriptions

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

When asked why do humans crave sugar, Claude Sonnet for Point5 responds, Quote Heading Evolutionary and biological reasons Asterisk asterisk survival advantage asterisk asterisk Our ancestors evolved to seek out sugar because Ripe fruits, natural sugars, provided quick energy in calorie-scarce environments.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Sweet taste signaled, safe to eat, poisonous plants are rarely sweet.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The brain runs almost exclusively on glucose.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Asterisk asterisk rewards system asterisk asterisk.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

When you eat sugar, your brain releases dopamine, creating pleasure and reinforcement.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This motivated our ancestors to remember and return to valuable food sources.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The same mechanism now makes modern processed foods feel irresistible.