Capabilities and prompt injection

20 views

Skip to first unread message

Alan Karp

unread,

May 17, 2025, 12:16:18 AMMay 17

to <friam@googlegroups.com>, cap-...@googlegroups.com

I found the statement I put in bold below particularly interesting. Back in 1999, HP hired Schneier's company to do a security analysis of the E-speak Beta, which used c-lists. Their review claimed security was flawed because it relied on name hiding. I guess he finally groks capabilities.

From Schneier's newsletter.

Applying Security Engineering to Prompt Injection Security

[2025.04.29] This seems like an important advance in LLM security against prompt injection:

Google DeepMind has unveiled CaMeL (CApabilities for MachinE Learning), a new approach to stopping prompt-injection attacks that abandons the failed strategy of having AI models police themselves. Instead, CaMeL treats language models as fundamentally untrusted components within a secure software framework, creating clear boundaries between user commands and potentially malicious content.
[...]
To understand CaMeL, you need to understand that prompt injections happen when AI systems can’t distinguish between legitimate user commands and malicious instructions hidden in content they’re processing.
[...]
While CaMeL does use multiple AI models (a privileged LLM and a quarantined LLM), what makes it innovative isn’t reducing the number of models but fundamentally changing the security architecture. Rather than expecting AI to detect attacks, CaMeL implements established security engineering principles like capability-based access control and data flow tracking to create boundaries that remain effective even if an AI component is compromised.

Research paper. Good analysis by Simon Willison.

I wrote about the problem of LLMs intermingling the data and control paths here.

--------------
Alan Karp

Mark S. Miller

unread,

May 17, 2025, 12:36:17 AMMay 17

to cap-...@googlegroups.com, <friam@googlegroups.com>

My brief cursory reaction to an internal discussion on April 29.

> > Capabilities are effectively tags that can be attached to each of the variables, to track things like who is allowed to read a piece of data and the source that the data came from. Policies can then be configured to allow or deny actions based on those capabilities.

> These don’t even sound vaguely like ocaps. Fortunately they only say “Capabilities” rather than “ocaps” or “Object-capabilities”. The reason I coined “ocap” in the first place was because the unqualified term “capability” had already been stretched into meaninglessness. The worst historical offender being so-called “Posix Capabilities”.
> In any case, what they describe sounds like taint tracking, which could indeed be very valuable for this problem. So I am not here criticizing their paper at all. If they are only concerned with the LLM being fallible, i.e., innocently writing code that happens to be exploitable, taint tracking is likely a good technique. But it does not offer the hard guarantees they claim. In the limit, anything a malicious coder might write, a fallible coder might write, even with a vanishingly small probability. To defend against malicious code you’d need to upgrade to something sound like full information-flow. Any code which passes would be safe against those issues. However, the price of safety is a vast loss of precision. Too much correct code will also be rejected. Unless the code author (LLM or not) were trained to write code that would pass the information-flow checker.

I wrote about the problem of LLMs intermingling the data and control paths here.

That doc says it is by Bruce Schneier. Did you mean to link to something else?

In any case, the Schneier article looks like it'll be interesting, thanks!

--------------
Alan Karp

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CANpA1Z0Def-PY0Ci-0EBMRGfud7rHqqF7T9nDZ5Xz5z-2gPyNg%40mail.gmail.com.