Eliezer Yudkowsky
๐ค SpeakerAppearances Over Time
Podcast Appearances
And it is a really remarkable thing, I say in passing, that despite having complete read access to every floating point number in the GPT series, we still know vastly more about
the architecture of human thinking than we know about what goes on inside GPT, despite having vastly better ability to read GPT.
Sure.
I think that if, you know, like half of today's physicists stop wasting their lives on string theory or whatever and go off and study what goes on inside transformer networks, then in, you know, like 30, 40 years, we'd probably have a pretty good idea.
Do you think these large language models can reason?
They can play chess.
How are they doing that without reasoning?
I mean, in my writings on rationality, I have not gone making a big deal out of something called reason.
I have made more of a big deal out of something called probability theory.
And that's like, well, your reasoning...
But you're not doing it quite right, and you should reason this way instead.
And interestingly, people have started to get preliminary results showing that reinforcement learning by human feedback has made the GPT series worse in some ways.
In particular, it used to be well-calibrated.
If you trained it to put probabilities on things, it would say 80% probability and be right 8 times out of 10.
And if you apply reinforcement learning from human feedback, the nice graph of 70% 7 out of 10 sort of flattens out into the graph that humans use, where there's some very improbable stuff.
And
likely, probable, maybe, which all means like around 40%, and then certain.
So it's like it used to be able to use probabilities, but if you apply, but if you try to teach it to talk in a way that satisfies humans, it gets worse at probability in the same way that humans are.
And that's a bug, not a feature.
I would call it a bug.