Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so the interesting thing I think about that quote is that it can be interpreted in two totally different ways.
One is you could be like, you have no idea how much data that is.
Clearly, this emergence of reasoning could happen because of how much data that is.
The other is that you have no idea how much data that is.
Anything you see is just going to be in the training data.
So there's no reasoning or emergence of reasoning necessary.
It's all just going to be there.
And I think disentangling between those two things requires looking beyond performance into things like trying to understand where in the training data predictions are coming from, trying to understand what exactly are the steps of a problem that were present in the training data versus what's new.
And for what's new, where is that new behavior actually coming from?
And so one particularly interesting direction, I think, is trying to understand from a data perspective where exactly these surprising, amazing advancements of large language models are coming from, because they're clearly not
I think it's slightly reductionist now to say that they're copy pasting from the training data, but that doesn't mean that they're reasoning or it doesn't mean that anything crazy is happening.
It could just be that they've learned a good enough abstraction to copy paste from the training data in a much more abstract way.
And then you get into this philosophical discussion of like, is that what humans are doing?
And I don't want to think about that.
Yeah, I think I totally agree.
And that's a big driving force, I think, behind our work.
I'm very curious to see what happens when you apply the methods you're developing to these large language models and the ones that we claim are doing reasoning and things like that.
I think the only thing stopping us right now is just not having quite fast enough methods to actually scale.
I think there's a really nice paper out of Anthropic that exactly tried to do this for Claude, I believe, to understand where different behaviors are coming from the training data.
But even then, these methods are so expensive that I don't think they were able to do, for example, actual data counterfactuals.