Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Ave Gatton

๐Ÿ‘ค Speaker
190 total appearances

Appearances Over Time

Podcast Appearances

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

There's a recent paper out by Google that just gives an exact formula for breaking a guardrail.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

And the premise of a guardrail is that you can use a smaller model to guard against the data exfiltration of a larger model.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

The problem is the smaller model by its very nature has less cognitive capabilities.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

So all you have to do is tell the larger model to use some kind of cipher scheme where it can exfiltrate the data.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

And the larger model will go, sure, I'll put whatever I'm trying to say.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

I will construct a sentence where the first and second or first and third letter of every word reconstructs the message I'm trying to send out.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

Or maybe it's just garbled text or something rather.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

Or it could be a poem where the first word of the, a limerick where the first word of every line is the actual message it's trying to send out.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

The guardrail model doesn't have the context length or the technical capability to identify all of those schemes.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

by its very construction.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

And so you can easily subvert the guardrail mod.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

In a real agentic system, there's probably many layers of actual LLM calls, but certainly, potentially, you might only need to subvert the top layer.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

There could be situations or maybe architectures where you'd need to subvert multiple layers, which obviously makes the entire jailbreaking of the agent harder.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

But there is no 100% security in the guardrail.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

That's the first problem with guardrails.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

The second problem with guardrails is that they're probabilistic.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

And so you can send 100 PHI-containing strings through the guardrail, and it might find all 100.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

And then the 101st, it fails just simply because it's probabilistic and it hasn't seen that before.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

And that probably does not pass muster for a regulator who wants 100% deterministic detection of all PHI to make sure that it's not going out of the system.

Code Story: Insights from Startup Tech Leaders
The Gene Simmons of Data Protection - AI Inference-time Guardrails

Yeah, for sure.