Relevant to Jas's comments on my idea for AI agent safety

2 views
Skip to first unread message

Alan Karp

unread,
Feb 27, 2026, 10:23:27 PMFeb 27
to <friam@googlegroups.com>

from the link he sent out by mistake, https://github.com/nearai/ironclaw.
  • Semantic interposition. Instead of giving the agent raw system access, all interactions go through MCP servers (filesystem, git, etc.). Every tool call passes through a policy engine that can allowdeny, or escalate to the user for approval

--------------
Alan Karp

Mark S. Miller

unread,
Feb 27, 2026, 10:58:30 PMFeb 27
to fr...@googlegroups.com
Relevant to Jas's comments on my idea for AI agent safety

Sorry I missed it. Where do I find your idea and Jasvir's comments?


--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/friam/CANpA1Z3mEBKJa%2B%2ByB38fVL7Ex2dn%2BXjcNMzbLcVW_9UW-CSZzw%40mail.gmail.com.

Alan Karp

unread,
Feb 28, 2026, 1:11:17 AMFeb 28
to fr...@googlegroups.com
Well, then, you'll just have to get your priorities straight and show up, won't you?

My basic idea that I've talked about in earlier friams is pretty much what's in that bullet point. 

--------------
Alan Karp


Ben Laurie

unread,
Feb 28, 2026, 3:51:20 PMFeb 28
to fr...@googlegroups.com
OK, and how does that policy engine work?

Alan Karp

unread,
Feb 28, 2026, 7:11:22 PMFeb 28
to fr...@googlegroups.com
In my proposal, it does 2 things.  The proxy mediates all communication to and from the LLM agent, and it holds any secrets that the agent needs to authenticate, delegate, invoke, or revoke.  The proxy will only sign requests that satisfy your policy, sort of like a request side PDP.  It knows what data you want kept private and won't distribute it if the agent asks it to.  In a more advanced version, it manages the agent's persistent memory.

--------------
Alan Karp


๏̯͡๏ Jasvir Nagra

unread,
Mar 1, 2026, 1:36:33 AMMar 1
to fr...@googlegroups.com
The challenge I was having iirc during the call was - "if I am deducing if a message is acceptable or not" just by looking at the message, we are already lost.  I felt that a separate policy that steps in to decide after the fact violated the grant vs use principle on when you do the check that always felt fundamental to me on what made ocap systems ocap systems.

-- 
Jasvir Nagra


Alan Karp

unread,
Mar 1, 2026, 1:11:02 PMMar 1
to fr...@googlegroups.com
On Sat, Feb 28, 2026 at 10:36 PM ๏̯͡๏ Jasvir Nagra <j...@nagras.com> wrote:
The challenge I was having iirc during the call was - "if I am deducing if a message is acceptable or not" just by looking at the message, we are already lost.  I felt that a separate policy that steps in to decide after the fact violated the grant vs use principle on when you do the check that always felt fundamental to me on what made ocap systems ocap systems.

We often talk about confinement as a capability pattern.  Isn't the proxy an example of that?  Is it any different than a membrane that controls the capabilities that go through it?
 
--------------
Alan Karp

Ben Laurie

unread,
Mar 2, 2026, 11:55:02 AMMar 2
to fr...@googlegroups.com
On Sun, 1 Mar 2026 at 00:11, Alan Karp <alan...@gmail.com> wrote:
In my proposal, it does 2 things.  The proxy mediates all communication to and from the LLM agent, and it holds any secrets that the agent needs to authenticate, delegate, invoke, or revoke.  The proxy will only sign requests that satisfy your policy, sort of like a request side PDP.  It knows what data you want kept private and won't distribute it if the agent asks it to.

How does it know that the agent has asked to distribute private data?
 

Alan Karp

unread,
Mar 2, 2026, 12:21:08 PMMar 2
to fr...@googlegroups.com
On Mon, Mar 2, 2026 at 8:55 AM 'Ben Laurie' via friam <fr...@googlegroups.com> wrote:

How does it know that the agent has asked to distribute private data?

The same way you do it in the enterprise.  I've seen plenty of documents marked IBM Confidential.

--------------
Alan Karp

Ben Laurie

unread,
Mar 2, 2026, 12:44:00 PMMar 2
to fr...@googlegroups.com
Sure, but the concern is not that it shares a document marked confidential, but that it shares information extracted from such a document.
 

--------------
Alan Karp

--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.

Alan Karp

unread,
Mar 2, 2026, 12:57:48 PMMar 2
to fr...@googlegroups.com
On Mon, Mar 2, 2026 at 9:44 AM 'Ben Laurie' via friam <fr...@googlegroups.com> wrote:


On Mon, 2 Mar 2026 at 17:21, Alan Karp <alan...@gmail.com> wrote:
On Mon, Mar 2, 2026 at 8:55 AM 'Ben Laurie' via friam <fr...@googlegroups.com> wrote:

How does it know that the agent has asked to distribute private data?

The same way you do it in the enterprise.  I've seen plenty of documents marked IBM Confidential.

Sure, but the concern is not that it shares a document marked confidential, but that it shares information extracted from such a document.

I didn't say it was going to be easy, but I can envision a number of strategies.  The point is that you have no hope if the agents talk directly to each other.
 
--------------
Alan Karp

๏̯͡๏ Jasvir Nagra

unread,
Mar 2, 2026, 1:48:08 PMMar 2
to fr...@googlegroups.com
Good q. My gut feel is it’s time of check vs time of use problem. I could be wrong and I am struggling especially to say why this is different from the proxy pattern. welcome help thinking it through.

The closest analogy in my head is a WAF. Something happens in the browser ...ou get a request..before it hits the real server there’s a checker sitting in front. And the checker’s job is "is this payload bad?" which, in practice, means regex for "<script"-ish patterns, try to spot base64/URL encoding / other obfuscations, etc.

And once you’re doing that, you’ve already lost. You’re judging "acceptability" from bytes, without knowing what the input is going to do (which sink, which action, what authority is actually being exercised). So you end up in an arms race with encodings, benign strings get flagged, and dangerous stuff can look totally normal.

That’s also why it smells non-ocap to me. In an ocap system, the safety property comes from what capability was granted (and how it’s attenuated), not from a late-stage stage trying to infer intent at use time. A proxy can absolutely be a confinement/membrane pattern.

-- 
Jasvir Nagra


--
You received this message because you are subscribed to the Google Groups "friam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.

Alan Karp

unread,
Mar 2, 2026, 1:59:59 PMMar 2
to fr...@googlegroups.com
I don't want to confuse the proxy filtering data with the proxy filtering permissions.  The latter is handled by giving the agent a set of capabilities for which the proxy is the caretaker.  The former is harder once the agent is given the data.  However, it should be possible to do a reasonable job of limiting what data the agent can send when exercising a capability to something on the outside.

--------------
Alan Karp


Vinícius dos Santos Oliveira

unread,
Mar 5, 2026, 9:56:42 PMMar 5
to fr...@googlegroups.com
Em seg., 2 de mar. de 2026 às 15:59, Alan Karp <alan...@gmail.com> escreveu:
> I don't want to confuse the proxy filtering data with the proxy filtering permissions. The latter is handled by [...] The former is harder once the agent is given the data.

On the former... It reminded me of a 2012 talk by Benjamin Pierce. I
only saw the slides in any case:
https://www.cis.upenn.edu/~bcpierce/papers/TypesALaMilner.pdf

The talk goes -- after a lengthy intro to the area -- into the topic
of "types for privacy". The example given to explain the problem is
giving researchers access to hospital records without compromising
privacy of the patients. Two obvious ideas are removal of identifying
data from the results and summarization of the database. Both can be
circumvented if done naively. Then you can have a trusted person to
selectively authorize/deny queries done against the database to make
sure no deanonymization happens, but this solution doesn't scale. The
talk then introduces a type system for ensuring such queries can't be
used to circumvent privacy.

The problem reminds me just of someone guiding an MCP agent to do
tasks on your behalf. A proxy popping a user dialog to authorize/deny
MCP requests has parallels to the person authorizing/rejecting SQL
queries to hospital records from the previous story.
Reply all
Reply to author
Forward
0 new messages