Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Tom Griffiths

πŸ‘€ Speaker
539 total appearances

Appearances Over Time

Podcast Appearances

And so if we can use inductive bias to nudge the models towards more human-like solutions, they're probably going to be things that make a little more sense to us as well.

Yeah, so some examples of things that we've done in my lab that sort of reveal some of this weirdness.

One of them is that large language models are very sensitive to the probabilities of the outputs that they're producing, right?

So when people were very excited about these models, there was the paper, the Sparks of AGI paper that came out that said, GPT-4 sort of exhibits these remarkable abilities.

Tom McCoy and some colleagues, we wrote a paper that we called Embers of Autoregression, which was saying, as much as you're getting sparks at the top, there are still these embers at the bottom, which are a consequence of the way these models are trained.

Again, these are things that in modern systems, there's all sorts of tricks that they've used to get around this, but if you take a raw language model of the kind that we were getting with GPT-4,

and you ask it to solve simple problems like counting the number of letters that appears in a string, how well they do on that is influenced by the probability of the answer that they would have to produce.

For example, they're much better at counting strings that have 30 letters in them than strings that have 29, because the number 30 appears on the internet more often than the number 29.

So it's a situation where there are other nearby answers that are pretty good, and some of those have higher probability.

And so as a consequence, it sort of produces the high probability thing rather than the thing that it's supposed to produce.

And so that's a...

weird idiosyncratic bias of language models, it's a consequence of the way that they're trained.

More generally, the way that I think about these systems is that we should expect, this is applying our computational level lens, we should expect

intelligent systems to behave in ways that are shaped by the kinds of problems that they're trying to solve.

And when we design our AI systems, we're making explicit choices about the kinds of problems that they're going to solve, things like being able to predict the next word or token that appears in a sequence.

And that's going to be something which influences its behavior.

And so to the extent that there's a difference

objective function, the goal that we have in training that system and the kinds of computational problems that human minds have evolved to solve, then we're going to expect the kinds of solutions that they find to look quite different.

And that's part of where we get this mismatch in behavior.

So the third thread, so we talked about logic and probability theory.