Illia Polosukhin
๐ค SpeakerAppearances Over Time
Podcast Appearances
There's prompt injection detection.
There is data exfiltration detection.
There's all those pieces that effectively kind of layer on on top of each other, such that even if some, like, I mean, prompt injections are, like, they're not deterministic, right?
They are probabilistic.
If that falls through, it's still not able to go and send a bunch of stuff out because the credential store will check
If the tool, if your LLM wrote a tool for itself, but that tool is broken, that's not going to break everything.
If it's trying to like go and delete all your emails, right, that's going to be stopped by approval process and kind of following like this action check.
So like all the system really designing kind of more is like how to give the flexibility, but also protect the system from itself and from external effects.
So the thing is like, these are, I think the education levels of humans is probably the wrong analogy here because these are, you know, they know like nuclear physics and quantum physics probably better than all of us.
they know the knowledge but their judgment is is yeah their judgment and and it's also just the context management like at the end they're they're if you know movie memento right they kind of all they go this all ends living in memento they're just like boot up and it's like the only thing you know is like this like system prompt and like go figure out what you do and you only have you know yeah like 10 minutes to figure this out and then you dad right and then you
That's really like the current.
And obviously that piece is going to keep improving like the longer context, et cetera.
But yeah, right now what you need to do is effectively manage that state where they're pretty intelligent.
There's some kind of judgment lapses, but so is with people.
And so you would do the same things for people, right?
Like if we're setting up, you know, key management system, you're probably not going to give full access to all of your, you know, DAO funds to a single individual, right?
You're going to like, hey, you can spend this much, but then you need approvals.
So that makes sense either way.
So this is kind of, you know, structure we're applying here.
And the same as you kind of roll in.