Sam Marks

👤 Speaker

891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

During pre-training, LLMs learn to be predictive models that are capable of simulating diverse personas based on entities appearing in training data.

176.729 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Real humans, fictional characters, real and fictional AI systems, etc.

185.38 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

2.

190.426 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Post-training refines the LLMs model of a certain persona which we call the assistant.

191.948 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

When users interact with an AI assistant, they are primarily interacting with this assistant persona.

197.215 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

the behavior of the resulting AI assistant can then be understood largely via the traits of the assistant persona.

203.645 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

This general idea is not unique to us.

210.176 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Our goal in this post is to articulate and name the idea, discuss empirical evidence for it, and reflect on its consequences for AI development.

213.482 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

In the remainder of this post, we will

221.996 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Describe the persona selection model, PSM, and supporting evidence.

225.43 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For instance, we argue that PSM provides an explanation for various surprising results in the generalization and interpretability literatures.

230.579 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Reflect on the consequences of PSM for AI development.

239.795 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Insofar as PSM is a good model of AI-assistant behavior, it has some surprising consequences.

243.942 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

For instance, PSM recommends anthropomorphic reasoning about AI assistance and introduction of data to pre-training representing positive AI archetypes.

250.321 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Ask how exhaustive PSM is as a model of AI assistant behavior.

260.412 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Does understanding the assistant persona tell us everything we'd like to know?

265.12 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We sketch out a spectrum of views on these questions, ranging from the popular masked shoggoth, where an outer agent can puppet the assistant towards its own ends, to an opposite perspective where the post-trained LLM is like a neutral operating system running a simulation that the assistant lives within.

268.265 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

We also discuss some relevant empirical observations and conceptual reasons that PSM may or may not be exhaustive, and we speculate about how this might change in the future.

285.775 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

There's an image here.

295.751 View full episode →

LessWrong (Curated & Popular)

"The persona selection model" by Sam Marks

Description.

297.654 View full episode →

← Previous Page 2 of 45 Next →

Report any issue