Ave Gatton

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

There's a recent paper out by Google that just gives an exact formula for breaking a guardrail.

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

And the premise of a guardrail is that you can use a smaller model to guard against the data exfiltration of a larger model.

1046.564 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

The problem is the smaller model by its very nature has less cognitive capabilities.

1055.42 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

So all you have to do is tell the larger model to use some kind of cipher scheme where it can exfiltrate the data.

1060.81 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

And the larger model will go, sure, I'll put whatever I'm trying to say.

1067.644 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

I will construct a sentence where the first and second or first and third letter of every word reconstructs the message I'm trying to send out.

1070.89 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

Or maybe it's just garbled text or something rather.

1080.149 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

Or it could be a poem where the first word of the, a limerick where the first word of every line is the actual message it's trying to send out.

1083.335 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

The guardrail model doesn't have the context length or the technical capability to identify all of those schemes.

1091.051 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

by its very construction.

1099.387 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

And so you can easily subvert the guardrail mod.

1100.909 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

In a real agentic system, there's probably many layers of actual LLM calls, but certainly, potentially, you might only need to subvert the top layer.

1104.374 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

There could be situations or maybe architectures where you'd need to subvert multiple layers, which obviously makes the entire jailbreaking of the agent harder.

1114.868 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

But there is no 100% security in the guardrail.

1124.682 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

That's the first problem with guardrails.

1128.57 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

The second problem with guardrails is that they're probabilistic.

1130.434 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

And so you can send 100 PHI-containing strings through the guardrail, and it might find all 100.

1134.523 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

And then the 101st, it fails just simply because it's probabilistic and it hasn't seen that before.

1143.903 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

And that probably does not pass muster for a regulator who wants 100% deterministic detection of all PHI to make sure that it's not going out of the system.

1150.752 View full episode →

Code Story: Insights from Startup Tech Leaders

The Gene Simmons of Data Protection - AI Inference-time Guardrails

Yeah, for sure.

1197.902 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment