Dr. Richard Moulange
π€ SpeakerAppearances Over Time
Podcast Appearances
Here's some information about what the person did in the lab, a sort of series of steps, very complicated PhD level steps they took.
What do you think happened?
Why did this go wrong?
This is really getting at, we are trying to debug modern virology workflows.
And there'll be a bunch of answers, often maybe 10 different answers, only which maybe one to five are right.
And it'll be different for different questions.
And then the marking scheme is really quite harsh because it says, okay, unless you really identify all these things, we're not going to give you the mark.
So it's a pretty hard eval already.
What's harder about it is it was designed by virology experts.
And they had these multiple rounds of reviews described in the paper to get down to questions that are really well scoped for modern virology and really, really difficult.
So difficult, in fact...
Something else they did is they went and spoke to these experts who were writing the questions.
They said, hey, what sort of biological activities do you do in your day to day work and how good are you at them?
And really distinguish between merely having a working knowledge versus maybe being specialized versus having expertise in that particular thing.
And then they said, okay, for those who are expert in this particular sub domain, we're just gonna show you the questions from our benchmark that are officially about that.
So we are trying to make it as easy as possible for you as the human to do well.
We're not gonna show you things outside the thing you say yourself you're really, really good at.
Humans got 22% on the test.
Four out of five things in their own area of expertise they couldn't do.
So this is really, really hard.