Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
The LLM is like a neutral simulation engine.
The assistant, a person inside this simulation.
When the assistant pursues goals, that agency is the assistants, not the engines.
The engine no more puppets the assistant for its own ends than the laws of physics puppet humans.
What about after post-training?
A strict form of this view holds that post-trained LLMs are still pure predictive models.
This would be like rewriting the simulation engine to have different laws of physics or to model the assistant as having different traits, but such that it is still fundamentally running a simulation.
A more relaxed view admits that other lightweight changes may occur.
For example, if an LLM is trained to never output sexual content, this might be analogous to modifying the operating system so that all simulated content passes through a content filter before appearing in outputs.
the operating system is no longer literally running a simulation, but rather something slightly different, a simulation with a content filter.
So on this view, the post-trained LLM may no longer be strictly a predictive model, but rather a predictive model with certain types of lightweight changes.
Importantly, the operating system view denies that these changes amount to de novo agency.
To give a more mechanistic mental model, one could imagine that after pre-training, the LLM is like an operating system with persona submodules containing the logic for persona simulation.
Further, all adgentic behavior expressed in LLM outputs is fundamentally powered by these persona submodules.
There are no independent adgentic mechanisms.
Then during post-training, various aspects of the operating system are changed.
For example, various submodules interoperate in different ways and the persona submodules themselves change.
But the basic system architecture remains the same.
In particular, persona submodules continue to power all agency, with other circuitry remaining non-agentic.
There's an image here.