Nick Heiner
๐ค SpeakerAppearances Over Time
Podcast Appearances
I think, you know, there was in 2023, a lot of debate over whether it was going to be small, specialized models or large, big models.
And the large modeled side of the debate has been winning for the last three years.
So I definitely expect to see, like, we're just going to train one model that can both do PowerPoint and a complicated analysis.
History is proven.
Yeah, basically.
Yeah.
So so one thing we found, as you'll note in the write up, like we had said, a lot of the models behaved as if they were solving an academic problem.
And this is interesting, but actually not surprising at all, because, you know, again, you are your objective function, right?
Like you get what you're trained for.
And a lot of benchmarks are fairly academic and contrived.
And this is a natural consequence of the fact that building a benchmark is incredibly expensive.
And a lot of them are being done from an academic context that don't have huge budgets.
And so like, you know, if you imagine like some of the questions that we were posing to the models here, they take a finance professional 20, 30, 40 hours to do.
So in order to build a benchmark, hundreds of questions like that,
you need to find enough finance professionals and you need to pay them to spend 20 to 30, 40 hours per task.
And so that's quite a lot.
And frankly, until you've made an investment in having a really deep expert network and a lot of technology to produce great data with those people, it's just not feasible.
And so that's why you see a lot of benchmarks that have been used are like glorified like SATs.
And so, you know, that's why we see that the models sort of behave in a very academic way.
But when you put these real world constraints on them, like they're sort of that last mile problem.