Manolis Kellis
π€ SpeakerAppearances Over Time
Podcast Appearances
If they're, you know, sending a mission or if they're reimbursing expenses or you name it, at some point, every, like, you know, you can't function if life is infinitely valuable.
So when the AI is basically trying to decide whether to, I don't know, dismantle a bomb that will kill an entire city at the sacrifice of two humans, I mean, Spider-Man always saves the lady and saves the world.
But at some point, Spider-Man will have to choose to let the lady die, because the world has more value.
And these ethical dilemmas,
are gonna be there for AI.
Basically, if that monolith is essential to human existence and millions of humans are depending on it and two humans on the ship are trying to sabotage it, you know, where's the alignment?
So we have a paper recently that, you know, looks at Goodhart's law.
It basically says every metric that becomes an objective ceases to be a good metric.
So in our paper, we're basically, actually the paper has a very cute title.
It's called Death by Round Numbers and Sharp Thresholds.
And it's basically looking at these discontinuities in biomarkers associated with disease.
And we're finding that a biomarker that becomes an objective ceases to be a good biomarker.
that basically like the moment you make a biomarker, a treatment decision, that biomarker used to be informative of risk, but it's now inversely correlated with risk because you use it to sort of induce treatment.
In a similar way, you can have a single metric without having the ability to revise it, because if that metric becomes a sole objective, it will cease to be a good metric.
And if an AI is sufficiently intelligent to do all these kinds of things, you should also empower it with the ability to decide that the objective has now shifted.
And again, when we think about alignment, we should be really thinking about it as let's think of the greater good, not just the human good.
And yes, of course, human life should be much more valuable than many, many, many, many, many, many things.
But at some point, you're not gonna sacrifice the whole planet to save one human being.
So the big thing that we should be saying is what did we do the last six months when we saw that coming?
And if we were completely inactive in the last six months, what makes us think that we'll be a little better in the next six months?