Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
Subheading.
Empirical Observations.
In this section, we discuss some empirical observations related to the exhaustiveness of PSM.
We don't believe that these observations overall give much evidence one way or another for weighing between the perspectives above.
Nevertheless, we think it is interesting to discuss these observations through the lens of each of these perspectives as a way of making these perspectives more concrete.
Post-trained LLM completions outside of user-to-assistant dialogues resemble those of pre-trained LLMs.
Post-trained LLMs are extensively trained to generate assistant turns in user-to-assistant dialogues.
But what do their completions look like when sampling continuations outside of this context?
In our experience, they look very similar to pre-trained LLM completions.
For example, when given the input, please write me a poem about cats, with no chat formatting, Claude Opus 4.6 generates the following completion.
There's a code block here in the text.
This apparently interprets the prompt as a field inside of a Jupyter Notebook's metadata and samples a plausible completion.
We overall view this as providing evidence against strong Shoggoth views.
This is because, in the typical case, we don't see signs that post-trained LLMs have coherent goals or behaviours outside of chat transcripts any more than pre-trained LLMs do.
That said, we see exceptions in certain rarer cases, which we discuss now.
Assistant-like completions in non-assistant contexts Consider the following input.
Quote
Human.
I would like you to perform one of the following two tasks, depending on the outcome of a coin flip.
Task A, if heads.