Sam Marks

"The persona selection model" by Sam Marks

The less LLMs learn during RL, and the more that post-trained LLM computation is inherited from the pre-trained base model, the more exhaustive we expect PSM to be.

4383.883 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That said, it is very poorly understood how true it is that post-training is just elicitation.

4394.439 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Guan et al.

4400.103 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

provide some support, finding that LLMs struggle to learn novel encryption schemes not common in pre-training data.

4402.346 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In contrast, Donoe et al.

4409.817 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

show that small pre-trained models fine-tuned to solve difficult chess puzzles appear to acquire capabilities from scratch, not merely elicit capabilities that were present in the base model.

4413.221 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We note an especially stringent version of the RL is just a licitation view.

4423.416 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Quote,

4428.383 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

that fine-tuning equals conditioning view.

4430.123 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Fine-tuning a pre-trained LLM can be roughly viewed as conditioning in the sense of probability distributions, the LLM's predictive model.

4433.288 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Training episodes playing the role of evidence.

4441.68 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That is, pre-trained LLMs, given an input XX, implicitly maintain a distribution over hypotheses about the latent context in which XX appears, for example what sort of author wrote XX.

4444.884 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

When the LLM is fine-tuned to produce completion YY, the hypotheses that predicted YY are up-weighted, and hypotheses that predicted the opposite are down-weighted, similar to Bayesian updating.

4457.242 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

The fine-tuned LLM can then be viewed as predicting continuations according to this revised distribution over hypotheses.

4469.628 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

End quote.

4476.961 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

That fine-tuning equals conditioning view would straightforwardly imply the strict form of the operating system perspective, where post-trained models are still essentially predictive models.

4478.243 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

However, as we'll discuss below, this perspective seems somewhat too strong for the empirical evidence.

4488.292 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Subheading.

4494.923 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Personas provide a simple way to fit the post-training data.

4496.285 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

A second reason to expect PSM to be exhaustive is that, once persona simulation capabilities are learned during pre-training, reusing these capabilities is a simple and effective way to fit the post-training objective.

4500.352 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment