Sam Marks
๐ค SpeakerAppearances Over Time
Podcast Appearances
Burns, 2024.
Nostal-Gebreist, 2025.
Subheading.
Predictive models and personas.
The first phase in training modern LLMs is called pre-training.
During pre-training, the LLM is trained to predict what comes next given an initial segment of some document, such as a book, news article, piece of code, or conversation on a web forum.
Via pre-training, LLMs learn to be extremely good predictive models of their training corpus.
We refer to these LLMs, those that have undergone pre-training but not subsequent training phases, as base models.
Even though AI developers don't ultimately want predictive models, we pre-train our LLMs in this way because accurate prediction requires learning rich cognitive patterns.
Consider predicting the solution to a math problem.
If the model sees what is 347 times 28, followed by the start of a worked solution, continuing this solution requires understanding of the algorithm for multi-digit multiplication.
Similarly, accurately predicting continuations of diverse chess games requires understanding the rules of chess.
Thus, a strong predictive model requires factual knowledge about the world, logical reasoning, and understanding of common sense physics, among other cognitive patterns.
An especially important type of cognitive pattern is an agent model or persona.
Consider the following example completion from the Claude Sonnet for 0.5 base model.
The bold text is the LLM completion, the non-bold text is the prefix given to the model.
Quote
Linda wanted her ex-colleague David to recommend her for a VP role at Nexus Corporation.
What she didn't know was that David had been quietly pursuing the same role for months.
It was the opportunity he'd been waiting for his entire career.