Richard Sutton
👤 PersonAppearances Over Time
Podcast Appearances
Yep.
Okay, so...
Is there any way for it to tell, in the largest language model set up, to tell what's the right thing to say?
You will say something and you will not get feedback about what the right thing to say is because there's no definition of what the right thing to say is.
There's no goal.
And if there's no goal, then there's one thing to say, another thing to say.
There's no right thing to say.
So there's no ground truth.
You can't have prior knowledge if you don't have ground truth.
Because the prior knowledge is supposed to be a hint or an initial belief about what the truth is.
But there isn't any truth.
There's no right thing to say.
Now, in reinforcement learning, there is a right thing to say or a right thing to do because the right thing to do is the thing that gets you reward.
So we have a definition of what the right thing to do is.
And so we can have prior knowledge or knowledge provided by people about what the right thing to do is.
And then we can check it to see because we have a definition of what the actual right thing to do is.
Now, an even simpler case is when you're trying to make a model of the world.
When you predict what will happen, you predict and then you see what happens.
Okay, so there's ground truth.
There's no ground truth in large language models because you don't have a prediction about what will happen next.