Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Sam Marks

๐Ÿ‘ค Speaker
891 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The LLM is like a neutral simulation engine.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The assistant, a person inside this simulation.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

When the assistant pursues goals, that agency is the assistants, not the engines.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

The engine no more puppets the assistant for its own ends than the laws of physics puppet humans.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

What about after post-training?

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

A strict form of this view holds that post-trained LLMs are still pure predictive models.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

This would be like rewriting the simulation engine to have different laws of physics or to model the assistant as having different traits, but such that it is still fundamentally running a simulation.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

A more relaxed view admits that other lightweight changes may occur.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, if an LLM is trained to never output sexual content, this might be analogous to modifying the operating system so that all simulated content passes through a content filter before appearing in outputs.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

the operating system is no longer literally running a simulation, but rather something slightly different, a simulation with a content filter.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

So on this view, the post-trained LLM may no longer be strictly a predictive model, but rather a predictive model with certain types of lightweight changes.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Importantly, the operating system view denies that these changes amount to de novo agency.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

To give a more mechanistic mental model, one could imagine that after pre-training, the LLM is like an operating system with persona submodules containing the logic for persona simulation.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Further, all adgentic behavior expressed in LLM outputs is fundamentally powered by these persona submodules.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There are no independent adgentic mechanisms.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

Then during post-training, various aspects of the operating system are changed.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

For example, various submodules interoperate in different ways and the persona submodules themselves change.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

But the basic system architecture remains the same.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

In particular, persona submodules continue to power all agency, with other circuitry remaining non-agentic.

LessWrong (Curated & Popular)
"The persona selection model" by Sam Marks

There's an image here.