Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The less LLMs learn during RL, and the more that post-trained LLM computation is inherited from the pre-trained base model, the more exhaustive we expect PSM to be.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That said, it is very poorly understood how true it is that post-training is just elicitation.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Guan et al.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

provide some support, finding that LLMs struggle to learn novel encryption schemes not common in pre-training data.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In contrast, Donoe et al.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

show that small pre-trained models fine-tuned to solve difficult chess puzzles appear to acquire capabilities from scratch, not merely elicit capabilities that were present in the base model.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We note an especially stringent version of the RL is just a licitation view.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Quote,

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

that fine-tuning equals conditioning view.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Fine-tuning a pre-trained LLM can be roughly viewed as conditioning in the sense of probability distributions, the LLM's predictive model.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Training episodes playing the role of evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That is, pre-trained LLMs, given an input XX, implicitly maintain a distribution over hypotheses about the latent context in which XX appears, for example what sort of author wrote XX.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

When the LLM is fine-tuned to produce completion YY, the hypotheses that predicted YY are up-weighted, and hypotheses that predicted the opposite are down-weighted, similar to Bayesian updating.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The fine-tuned LLM can then be viewed as predicting continuations according to this revised distribution over hypotheses.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

End quote.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

That fine-tuning equals conditioning view would straightforwardly imply the strict form of the operating system perspective, where post-trained models are still essentially predictive models.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

However, as we'll discuss below, this perspective seems somewhat too strong for the empirical evidence.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Subheading.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Personas provide a simple way to fit the post-training data.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

A second reason to expect PSM to be exhaustive is that, once persona simulation capabilities are learned during pre-training, reusing these capabilities is a simple and effective way to fit the post-training objective.