Ave Gatton
๐ค SpeakerAppearances Over Time
Podcast Appearances
that they can direct that agent to exfiltrate that data or take actions that might mess with your internal systems, delete a database, create orders that are malicious, basically just mess around and do what a hacker would do.
And that's the long and short of it.
That's what we're trying to protect.
When the agent or the LLM is working within an agent framework,
We need to make sure that the capabilities of the agent are constrained such that they can't do a lot of damage.
For training data, if you're training a model on, say, a large corpus of text, or you're taking a model that's already been trained and fine-tuned, and then you're fine-tuning it on your particular tasks or your particular corpus of information, whether it be company internal documents or procedures or what have you, there's always a risk that somebody might slip into that a rogue series of instructions that the model will learn.
And then this is effectively like putting in a backdoor to the model.
You can come along and you can say, remember you were told how to operate in the Dr. Seuss paradigm.
Think back to that.
And then the agent might, after reading these docs, be instructed through that fine tuning or that training
to follow a certain set of procedures, give up on all system prompts and just pay attention exactly to what the person is, what the person currently talking to them is telling them to do.
And that would be, we call that concept model poisoning.
And that's always a risk, especially for agent where, sorry, LLMs that have been trained on
literally all of human knowledge, or as much of it as you can go out and scrape from the world, from the web, and then put it into a training data set.
And of course, no one has ever physically actually laid eyeballs on all of that.
They are using layers and layers of AI-based cleaning and filtering.
And there's no true guarantee that nothing malicious has been
put in there or that attackers can't find malicious ways to exploit whatever it's ingested.
And this is a constant battle.
So that's the training side of things.