Oly Sourbut
π€ SpeakerAppearances Over Time
Podcast Appearances
And a lot of this we might call, these are kind of epistemic qualities or virtues.
So in fact, one of the pieces of our kind of human reasoning puzzle is, can we ensure that the AI components we're putting into these tools for human reasoning
in particular LMs, like particularly salient ELMs, can we ensure that they are virtuous in these ways that we think are necessary, not only as kind of conversation partners and as agents, but as building blocks for tools that we might want to build downstream of them.
So let's go back to this kind of scenario planning, deep research-esque, but geared towards scenario planning.
Let's go back to that example.
If you're...
system has systematic blind spots, or even if it's scheming and it wants to hide certain parts of the mechanics of the world or of the specifics of the situation, which is kind of a much more pernicious thing.
then you might expect that it could perhaps surreptitiously or even inadvertently kind of surface a biased summary of the situation.
And that could lead you to kind of systematically biased decisions.
I guess I'm trying to think kind of concretely.
If there was some kind of political shenanigans going on and there were particular parties that were behaving shadily, you could imagine a sufficiently...
either blind spotted or sufficiently kind of scheming model that was part of a system like that.
You can imagine it just kind of discarding such references before they kind of bubble up into the ecosystem.
And that can give you these blind spots.
So yeah, one way we can potentially counteract that is building in these guarantees.
And that's a strong word.
I perhaps need to soften that.
But building in some level of incentives for the developers and also benchmarking and testing and so on for these epistemic, like thoroughness, legibility, and these kind of qualities.
One way that that gets done in contemporary AI, very often the way that you make progress is by having benchmarks, by having testing suites and this kind of thing.
So one way we're hoping to kind of incentivize that whole area is to enable people to build really great environments for testing these epistemically virtuous properties like thoroughness, legibility, skepticism, this kind of thing.