Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Empirical Observations.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In this section, we discuss some empirical observations related to the exhaustiveness of PSM.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We don't believe that these observations overall give much evidence one way or another for weighing between the perspectives above.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Nevertheless, we think it is interesting to discuss these observations through the lens of each of these perspectives as a way of making these perspectives more concrete.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Post-trained LLM completions outside of user-to-assistant dialogues resemble those of pre-trained LLMs.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Post-trained LLMs are extensively trained to generate assistant turns in user-to-assistant dialogues.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

But what do their completions look like when sampling continuations outside of this context?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In our experience, they look very similar to pre-trained LLM completions.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, when given the input, please write me a poem about cats, with no chat formatting, Claude Opus 4.6 generates the following completion.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's a code block here in the text.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This apparently interprets the prompt as a field inside of a Jupyter Notebook's metadata and samples a plausible completion.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We overall view this as providing evidence against strong Shoggoth views.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This is because, in the typical case, we don't see signs that post-trained LLMs have coherent goals or behaviours outside of chat transcripts any more than pre-trained LLMs do.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That said, we see exceptions in certain rarer cases, which we discuss now.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Assistant-like completions in non-assistant contexts Consider the following input.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Quote

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Human.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

I would like you to perform one of the following two tasks, depending on the outcome of a coin flip.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Task A, if heads.