David Shipley
๐ค SpeakerAppearances Over Time
Podcast Appearances
And, you know, it's things like scarcity and reciprocity and liking.
These are known manipulation techniques for human beings.
Turns out when you train an AI off of human writing and you're susceptible to those, it's super susceptible to it as well.
And so there's a paper that came out called Call Me a Jerk.
It's an academic paper.
It's really easy to read.
University of Pennsylvania did it.
And they broke all the guardrails on chat GPT.
It's not supposed to be allowed to insult you.
It's supposed to be a sycophant by design.
And they were able to use these well-known techniques from door-to-door salesmen from decades ago to convince these AI to do really bad things, including generate recipes for certain drugs.
Exactly, but imagine...
if the recipes, the code recipes that this thing was taught included really good recipes as well as really, really terrible recipes, and it does not know the difference.
In fact, the more often something appears, like a common coding error, the more likely it is to think, this is what you do in this sequence.
So it literally took bad code
and made it the median, the average code that this thing produced.
So the OWASP top 10, if we're really nerding out on the show, there's like 10 known coding screw ups that everyone still keeps doing today.
And this thing literally was trained on code examples that have a lot of this in it.
So it often produces this.
It's meant to be a power saw, not a self-driving car.