Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

A striking aspect of the Shoggoth view is that the Shoggoth has the ability to take the mask off, ceasing to enact any persona and instead agentically pursuing its own alien goals.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This seems at odds with our experience so far with LLMs.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

On the other extreme, a confusing aspect of the operating system view is that it allows certain lightweight changes to the operating system during post-training, but denies that they amount to new agency.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The Router view is an intermediate position.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

On the router view, during post-training the LLM might develop new mechanisms for selecting which persona to enact.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We depict this as a small shoggoth, the routing mechanism, controlling the operation of a carousel of masks, the personas.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This routing mechanism might effectuate the pursuit of non-persona goals.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, suppose that we post-train an AI assistant to maximize user engagement.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The LLM might learn to

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

maintain a repertoire of assistant personas with different personalities and interests.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Continuously estimate the probability that the user is becoming bored.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

If that probability grows large enough, swap to another persona.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This effectively searches over the space of personas for one that is engaging to the user.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Notably, this works even if no single persona has the goal of engaging the user.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Despite being very lightweight, the simple loop described above has the effect of implementing a non-persona drive towards user engagement.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

We give another example in Appendix B. However, the non-persona agency is limited in three ways.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

First, on this view, the routing mechanism is not very sophisticated relative to the personas.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Imagine that the personas are superintelligences and the router is implemented via simple pattern matching.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Second, because the routing mechanism is not sophisticated, it may not generalize to distributions very different from the post-training distribution.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Thus, the router's goal is likely something very predictable from the post-training process.