Richard Ngo

"On Goal-Models" by Richard_Ngo

On Goal Models By Richard Ngo Published on February 2, 2026 I'd like to reframe our understanding of the goals of intelligent agents to be in terms of goal models rather than utility functions.

0.031 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

By a goal model I mean the same type of thing as a world model, only representing how you want the world to be, not how you think the world is.

16.611 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

However, note that this still a fairly inchoate idea, since I don't actually know what a world model is.

24.415 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

The concept of goal models is broadly inspired by predictive processing, which treats both beliefs and goals as generative models, the former primarily predicting observations, the latter primarily predicting actions.

30.828 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

This is a very useful idea, which for example allows us to talk about the distance between a belief and a goal, and the process of moving towards a goal, neither of which makes sense from a reward-utility-function perspective.

43.486 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

However, I'm dissatisfied by the idea of defining a world model as a generative model over observations.

55.904 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

It feels analogous to defining a parliament as a generative model over laws.

62.818 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

Yes, technically we can think of parliaments as stochastically outputting laws, but actually the interesting part is in how they do so.

67.728 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

In the case of parliaments, you have a process of internal disagreement and bargaining, which then leads to some compromise output.

75.732 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

In the case of world models, we can perhaps think of them as made up of many smaller, partial, generative models, which sometimes agree and sometimes disagree.

83.107 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

The real question is in how they reach enough of a consensus to produce a single output prediction.

91.663 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

One potential model of that consensus formation process comes from the probabilistic dependency graph formalism, which is a version of Bayesian networks in which different nodes are allowed to disagree with each other.

98.21 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

The most principled way to convert a PDG into a single distribution is to find the distribution which minimizes the inconsistency between all of its nodes.

109.621 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

PDGs seem promising in some ways, but I feel suspicious of any global symmetric of inconsistency.

118.929 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

Instead I'm interested in scale-free approaches under which inconsistencies mostly get resolved locally, though it's worth noting that Oliver's proposed practical algorithm for inconsistency minimization is a local one.

125.429 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

It's also possible that the predictive processing to active inference people have a better model of this process which I don't know about, since I haven't made it very deep into that literature yet.

137.326 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

Anyway, suppose we're thinking of goal models as generative models of observations for now.

147.479 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

What does this buy us over understanding goals in terms of utility functions?

153.046 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

The key trade-off is that utility functions are global but shallow whereas goal models are local but deep.

157.973 View full episode →

LessWrong (Curated & Popular)

"On Goal-Models" by Richard_Ngo

That is, we typically think of a utility function as something that takes as input any state or alternatively any trajectory of the world and spits out a real number.

163.6 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment