Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
The less LLMs learn during RL, and the more that post-trained LLM computation is inherited from the pre-trained base model, the more exhaustive we expect PSM to be.
That said, it is very poorly understood how true it is that post-training is just elicitation.
Guan et al.
provide some support, finding that LLMs struggle to learn novel encryption schemes not common in pre-training data.
In contrast, Donoe et al.
show that small pre-trained models fine-tuned to solve difficult chess puzzles appear to acquire capabilities from scratch, not merely elicit capabilities that were present in the base model.
We note an especially stringent version of the RL is just a licitation view.
Quote,
that fine-tuning equals conditioning view.
Fine-tuning a pre-trained LLM can be roughly viewed as conditioning in the sense of probability distributions, the LLM's predictive model.
Training episodes playing the role of evidence.
That is, pre-trained LLMs, given an input XX, implicitly maintain a distribution over hypotheses about the latent context in which XX appears, for example what sort of author wrote XX.
When the LLM is fine-tuned to produce completion YY, the hypotheses that predicted YY are up-weighted, and hypotheses that predicted the opposite are down-weighted, similar to Bayesian updating.
The fine-tuned LLM can then be viewed as predicting continuations according to this revised distribution over hypotheses.
End quote.
That fine-tuning equals conditioning view would straightforwardly imply the strict form of the operating system perspective, where post-trained models are still essentially predictive models.
However, as we'll discuss below, this perspective seems somewhat too strong for the empirical evidence.
Subheading.
Personas provide a simple way to fit the post-training data.
A second reason to expect PSM to be exhaustive is that, once persona simulation capabilities are learned during pre-training, reusing these capabilities is a simple and effective way to fit the post-training objective.