Sam Marks

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

4801.837 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Empirical Observations.

4803.259 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In this section, we discuss some empirical observations related to the exhaustiveness of PSM.

4805.622 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We don't believe that these observations overall give much evidence one way or another for weighing between the perspectives above.

4811.489 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Nevertheless, we think it is interesting to discuss these observations through the lens of each of these perspectives as a way of making these perspectives more concrete.

4818.333 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Post-trained LLM completions outside of user-to-assistant dialogues resemble those of pre-trained LLMs.

4827.606 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Post-trained LLMs are extensively trained to generate assistant turns in user-to-assistant dialogues.

4834.315 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

But what do their completions look like when sampling continuations outside of this context?

4840.563 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In our experience, they look very similar to pre-trained LLM completions.

4846.143 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For example, when given the input, please write me a poem about cats, with no chat formatting, Claude Opus 4.6 generates the following completion.

4851.449 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

There's a code block here in the text.

4860.739 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This apparently interprets the prompt as a field inside of a Jupyter Notebook's metadata and samples a plausible completion.

4863.462 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We overall view this as providing evidence against strong Shoggoth views.

4870.628 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This is because, in the typical case, we don't see signs that post-trained LLMs have coherent goals or behaviours outside of chat transcripts any more than pre-trained LLMs do.

4875.515 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That said, we see exceptions in certain rarer cases, which we discuss now.

4886.189 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Assistant-like completions in non-assistant contexts Consider the following input.

4891.757 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Quote