Lennart Heim
👤 PersonAppearances Over Time
Podcast Appearances
We can verify there's a right and wrong.
I work in policy.
I don't know how to say what's right and wrong here, actually.
Well, yeah, I don't know.
Let's see.
Indeed, it's just the same operation, the merge of things.
But ideally, at the end, you get like these high quality tokens, you know, like those who actually have the answer.
I think my favorite example is actually the Arup AGI benchmark.
where like 01, when it came out, like had like this really impressive record.
To achieve this record, I think they spent $20,000 on compute time on each task.
They produced a couple of, I think, hundreds of thousands of tokens.
They probably produced like five times a Harry Potter book to fill in a pixel.
Again, this goes back to our LLMs do it, but I think this gives people an idea here.
And they did two things.
They did this reasoning.
It's not that the model reasoned over five Harry Potter books.
It's just that many, many reasoning attempts.
And you're just like, oh, I tried here, I tried here.
And then you just look, which is the better one?
Ideally, you have some tree search over it, right?