Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
Why would training the LLM on declarative statements about the assistant generalize in this way?
This is natural from the perspective of PSM.
Post-training provides evidence about the assistant's persona, but it's not the only way to provide this evidence.
Another way is to directly teach the LLM declarative knowledge about the assistant in the same way that it learns knowledge about the world during pre-training.
This evidence then affects the LLM's enactment of the assistant, just as evidence obtained during post-training does.
See also our discussion below about data augmentation for good AI role models.
Subheading.
Behavioral evidence.
Insofar as AI assistance behaviors resemble the behaviors of entities appearing in pre-training data, this constitutes evidence for PSM.
In contrast, when AI assistants behave in ways that are extremely different from how real humans, fictional characters, or other personas would behave, this provides evidence against PSM.
It is often difficult to adjudicate whether a behavior provides evidence for PSM.
That said, in this section we discuss AI assistant behaviors that we think are best explained as arising from simulated personas and would be surprising otherwise.
Anthropomorphic self-descriptions
When asked why do humans crave sugar, Claude Sonnet for Point5 responds, Quote Heading Evolutionary and biological reasons Asterisk asterisk survival advantage asterisk asterisk Our ancestors evolved to seek out sugar because Ripe fruits, natural sugars, provided quick energy in calorie-scarce environments.
Sweet taste signaled, safe to eat, poisonous plants are rarely sweet.
The brain runs almost exclusively on glucose.
Asterisk asterisk rewards system asterisk asterisk.
When you eat sugar, your brain releases dopamine, creating pleasure and reinforcement.
This motivated our ancestors to remember and return to valuable food sources.
The same mechanism now makes modern processed foods feel irresistible.