String processing errors

16 views
Skip to first unread message

Alan Karp

unread,
Apr 14, 2026, 3:38:16 PM (11 days ago) Apr 14
to cap-...@googlegroups.com, <friam@googlegroups.com>
https://niyikiza.com/posts/map-territory/ lists a number of well-known problems when you deal with strings.  That page nicely explains the problem when you want to grant access to everything in /data.

Map vs. Territory

Let’s look at the core problem:

The Map: The string the LLM gives you. /data/../etc/passwd

The Territory: The inode the OS actually opens. /etc/passwd

The Vulnerability: Security checks usually validate the Map. Execution touches the Territory. When they disagree, attacks slip through.

Is sanitization the best we can do?  Or do capabilities give us something better?  What about more complicated situations, such as SQL queries and JSON schemas?

--------------
Alan Karp

Matt Rice

unread,
Apr 14, 2026, 5:33:34 PM (11 days ago) Apr 14
to fr...@googlegroups.com, cap-...@googlegroups.com
On Tue, Apr 14, 2026 at 7:38 PM Alan Karp <alan...@gmail.com> wrote:
>
> https://niyikiza.com/posts/map-territory/ lists a number of well-known problems when you deal with strings. That page nicely explains the problem when you want to grant access to everything in /data.
>
> Map vs. Territory
>
> Let’s look at the core problem:
>
> The Map: The string the LLM gives you. /data/../etc/passwd
>

I'm having difficulty finding a specific document that from what I
recall compares unix filesystems to one of keykos/eros/capros
directory objects
but in those systems ".." is not necessarily a thing, there is no
root, no implicit parent pointers, or tree shape.
Directories may be cyclic, forming a directed graph.

I would argue that they do not "solve" this problem, but avoid it entirely
(there exists no mapping of the directory structure to strings including ".."
unless it was intentionally added and given the arbitrary name "..") .

One thing to look at is hybrid capability systems like capsicum which
do attempt to deal with by switching
capability mode and ambient mode, where after entering capability mode
you can no longer turn strings
into capabilities via the filesystem.

Jonathan S. Shapiro

unread,
Apr 14, 2026, 6:09:09 PM (11 days ago) Apr 14
to cap-...@googlegroups.com, <friam@googlegroups.com>
It's slightly worse than that, because /data might be a symlink to an arbitrary place. In a poorly constructed chroot environment this could be used to trick the passwd program into accessing an entirely fabricated version of /etc/passwd.

I don't think this is an "it should be a capability approach" situation. Ambient authority is involved at each traversal step (at least in UNIX descendents), but almost any problem you can introduce by exploiting ambient authority can also be introduced by exploiting name re-bindings. Name spaces and bindings are much harder to get right than people tend to imagine.

Path canonicalization in UNIX variants is such a well known problem that different shells do not agree on how they canonicalize paths before opening/accessing. Some handle it textually while others walk the presented path segment by segment. If the openat() system call is used to do segment-at-a-time traversal, it's constrained by a sequence of stepwise descriptor walks, albeit with ambient access rights. Since opening '/' is a guarded special case due to chroot enforcement, the risk is that canonicalization is neither transactional nor idempotent. Changes in directory entry bindings before, after, or during a walk can lead to different results. This hazard also exists in capability-based implementations.

These days, the standard specification of open(2) actually states how it is supposed to follow these paths, but the behavior in pre-SVR4 editions of UNIX was inconsistently specified or not specified at all.

One could argue - and there are various reasons to consider - that "directory" objects should implement deep hierarchies rather than single level hierarchies, but this doesn't address the idempotency issue.

For extra credit, say what additional issues are introduced by loop-back mount points, both in modern UNIX but especially in Plan-9 where these can be introduced by non-privileged users in name spaces they control. A stiff drink may be helpful before starting; works best if you give it 15 minutes or so to kick in before you start thinking about this one.

It is sometimes a wonder to me how such a bright group of people got almost everything wrong about namespace security and units of operation. The UNIX process model didn't have a well-specified account for the effect of signal arrival on process and system state until... 1988, which Roger Faulkner, Steve Rago, and Ron Gomes nailed that down during the SVR4 /proc work. I poked my nose in a couple of times while working on the associated debugger support, but didn't have a big hand in it. Roger, Brendan Eich, and I extended it further in 1989/90 to add watchpoint handling with specified behavior for Sun's Solaris and SGI's IRIX, respectfully. So far as I'm aware, Linux still doesn't have a fully specified model, though there was work done in ptrace(2) to clean up a bunch of the bigger issues.


Jonathan


--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cap-talk/CANpA1Z3UwZtWtnyFAS_CpL1VUhCFrDQCxDkW_anWZmz16fdYZA%40mail.gmail.com.

Vinícius dos Santos Oliveira

unread,
Apr 15, 2026, 2:02:45 PM (10 days ago) Apr 15
to fr...@googlegroups.com, cap-...@googlegroups.com
Em ter., 14 de abr. de 2026 às 16:38, Alan Karp <alan...@gmail.com> escreveu:
> https://niyikiza.com/posts/map-territory/ lists a number of well-known problems when you deal with strings. That page nicely explains the problem when you want to grant access to everything in /data.
>
> Map vs. Territory
>
> Let’s look at the core problem:
>
> The Map: The string the LLM gives you. /data/../etc/passwd
>
> The Territory: The inode the OS actually opens. /etc/passwd
>
> The Vulnerability: Security checks usually validate the Map. Execution touches the Territory. When they disagree, attacks slip through.
>
> Is sanitization the best we can do? Or do capabilities give us something better?

For the specific case of path traversal, you may open /data, and use
the /data file descriptor (a capability) with
openat2/O_RESOLVE_BENEATH. That works on FreeBSD and Linux. Other UNIX
systems don't even bother to give any answer.

On FreeBSD you may even cap_rights_limit() the dirfd for /data, and
any fd acquired from openat on it will inherit the restrictions placed
by cap_rights_limit().

This blog post goes into the history behind O_RESOLVE_BENEATH:
https://val.packett.cool/blog/use-openat/#and-back-to-the-regular-syscall-interfaces

> What about more complicated situations, such as SQL queries and JSON schemas?

Filesystem access isn't ocaps (obviously), but at least there's an
initial idea of delegation. /home/someuser is ACL-checked against
someuser. ACLs might not scale, but the underlying point that I want
to make is that we always viewed subtrees as something meant to be
delegated. /data is no different in this regard. Filesystems have
clear points for delegation (even if the APIs outside FreeBSD/Linux
lag behind). Subtrees are meant to be delegated.

However for SQL queries and JSON schemas... it's not clear to me that
there is anything that is meant for delegation here. The well is too
poisoned in this case.

Alan Karp

unread,
Apr 22, 2026, 1:25:55 PM (3 days ago) Apr 22
to cap-...@googlegroups.com, <friam@googlegroups.com>
I've been pondering this problem and may have come up with a solution that doesn't rely on string sanitization.  To refresh your memory, Alice delegates to Bob access to everything in /data, even things added after the delegation.  Bob uses that capability to access some object by specifying a string, say "foo.txt".  Alice's machine interprets that request as being for /data/foo.txt.  Great, but what if the string is "../etc/password"?

The basic solution is that the capability to /data only allows asking for a capability to the thing designated by the specified string.  Alice, who knows everything in /data, keeps a map of strings to capabilities.  Bob then uses his /data capability to ask for a capability to foo.txt.  There won't be an entry in the map if he asks for ../etc/password.  You can avoid the round trip to retrieve the capability by having a /data service that accesses the map and forwards the request using the designated capability.

Does this approach solve the problem?  Does it introduce any vulnerabilities?  Can it be adapted to solve other related problems, such as SQL queries?

--------------
Alan Karp


On Tue, Apr 14, 2026 at 12:38 PM Alan Karp <alan...@gmail.com> wrote:

Rob Meijer

unread,
Apr 23, 2026, 7:16:28 AM (2 days ago) Apr 23
to cap-...@googlegroups.com, <friam@googlegroups.com>
It never grew a user base, so I haven't been doing maintenance on it for years, so not 100% sure it runs in the current python ecosystem (there was a weird issue a few years back, but if you follow the readme, that did the trick). But this is basicly what rumpletree does, not with any map, but with a root sparse-cap (multi rooted), and a single server side key. If it still installs, Play around with rumpelbox a bit. It's a demo tool of pyrumpeltree. As said, it's unmaintained because of a zero size user base AFAIK, but I think it fills your need exactly:


I wanted to make it into a users pace fil-system like MinorFS and MattockFS, but never found time to look  into the locking and random access crypto needs properly.

I'dd be hapy to help anyone wanting to take over the project  to get started, but I'm too filled up with other pet projects right now to work on pyrumpeltree or related stuff for now, so if you are interested in adopting it, or porting it, that would be great, if its indeed the fit that I think it is.

If not, it will stay the unmaintained thing it's now without a FUSE filesysyem implementation on top. 

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.

Rob Meijer

unread,
Apr 23, 2026, 7:28:15 AM (2 days ago) Apr 23
to cap-...@googlegroups.com, <friam@googlegroups.com>


On Thu, 23 Apr 2026, 13:16 Rob Meijer, <pib...@gmail.com> wrote:
It never grew a user base, so I haven't been doing maintenance on it for years, so not 100% sure it runs in the current python ecosystem (there was a weird issue a few years back, but if you follow the readme, that did the trick). But this is basicly what rumpletree does, not with any map, but with a root sparse-cap (multi rooted), and a single server side key. If it still installs, Play around with rumpelbox a bit. It's a demo tool of pyrumpeltree. As said, it's unmaintained because of a zero size user base AFAIK, but I think it fills your need exactly:


I wanted to make it into a users pace fil-system like MinorFS and MattockFS, but never found time to look  into the locking and random access crypto needs properly.

I'dd be hapy to help anyone wanting to take over the project  to get started, but I'm too filled up with other pet projects right now to work on pyrumpeltree or related stuff for now, so if you are interested in adopting it, or porting it, that would be great, if its indeed the fit that I think it is.


Just for context, it's a really small project. Just about 150 lines of python for the library and about 300 lines of code for the single codebase demo tools (rumpelbox has a busybox like symlink setup).  Should be easy enough to port or take custody of if it matches your needs, but there are a few mental click moments for many before it makes sense. 

I remember trying to explain it to Zooko many years ago when I didn't have code yet, and failing. But I think explaining it "with" code is probably easier. Especialy because the core of it is only 150 lines. 

Douglas Crockford

unread,
Apr 23, 2026, 7:41:18 AM (2 days ago) Apr 23
to friam
Te problem is '..'. If you have access to a directory, you also have access to the parent directory and on up. That convenience is  a leak. It should be abolished. 

Matt Rice

unread,
Apr 23, 2026, 6:58:04 PM (2 days ago) Apr 23
to fr...@googlegroups.com
On Thu, Apr 23, 2026 at 11:41 AM Douglas Crockford
<dou...@crockford.com> wrote:
>
> Te problem is '..'. If you have access to a directory, you also have access to the parent directory and on up. That convenience is a leak. It should be abolished.
>

I say we get rid of '.' or CWD too, static mutable state should be verboten.

Mike Samuel

unread,
Apr 24, 2026, 2:00:24 PM (yesterday) Apr 24
to friam
On path traversal attacks, is the root of the problem that path composition idioms are confusable deputies?

I've been working on content composition idioms, and here's an example of how simple, readable non-malicious code can get it wrong:

      let dir = "";  // empty string
      let file = "file.ext";

      // With simple string concatenation, joining a path on '/'
      // leads to a path that is not relative as intended, but
      // is accidentally absolute.
      assert(    "${dir}/${file}" == "/file.ext");

      // But the path tag does better.
      assert(path"${dir}/${file}".posixString == "file.ext");


The footgun in POSIX path syntax is that '/' is overloaded to mean "join path segments" when infix and to mean "root directory" when prefix.

We can defang a lot of path traversal attacks if we disallow parent traversal by default.

      let attack = "../../../../etc/passwd";
      assert(path"session-files/1234/uploads/${attack}".posixString == "/dev/null/zz_Temper_zz");

      let ok = "a/b/../c"; // Internal .. is ok
      assert(path"session-files/1234/uploads/${ok}".posixString == "session-files/1234/uploads/a/c");

But of course `..` that is literal can be privileged over that which comes from an untrusted string since the path function can distinguish the literal, fixed parts from a trusted author from untrusted interpolations..

      let abc = "a/b/c";
      assert(path"${abc}/..".posixString == "a/b");
      //                 ^^ trusted

But once we establish that path"..." can use literal '/' to establish content, we can do other things.  That same path string composition helper allows for explicit opt-in to parent traversal.
Here, the path tag adds `<` as a meta-character to allow opting into parent traversal with limitations.

      let upPath = "../../foo";
      let shortUpPath = "../foo";

      // /<, '<' after a slash, allows for upwards traversal
      assert(path"dir/<${upPath}".posixString == "../foo");
      // </, '<' before the slash, blocks upwards traversal
      assert(path"base</subdir/<${shortUpPath}".posixString == "base/foo");
      //              ^^      ^^
      assert(path"base</subdir/<${upPath}".posixString == "/dev/null/zz_Temper_zz");

Matt Rice

unread,
Apr 24, 2026, 7:16:27 PM (20 hours ago) Apr 24
to fr...@googlegroups.com
I suppose I am a little bit confused, as Norm's original confused
deputy paper doesn't use path rewriting at all,
the paths were apparently absolute (although using a different syntax,
or maybe using shell substitutions I'm not sure which).
So it feels like once we've fixed these path normalization problems we
still have underlying issues to resolve.

One question going back to Alan's original email he asked "Or do
capabilities give us something better?" to which I would ask
what kind of capabilities? The post seems to be LLM centric and using
data channels to transmit capabilities we are limited to password
capabilities. One disadvantage is that we encode password
capabilities typically in a way in which they are not intended to be
human meaningful,
so they wrap a safely usable user interface around the capabilities.
It isn't clear to me that this trick actually works with password/data
capabilities given to AI.


> assert(path"${dir}/${file}".posixString == "file.ext");

I'm not sure what language you're using or the semantics of the path
tag, I assume it is replacing the empty string with "./" and
normalizing
so that it extrapolates to ".//file.ext" -> "./file.ext" ->
"file.ext"? I'm not sure I find it intuitive.
> --
> You received this message because you are subscribed to the Google Groups "friam" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/friam/e003a52b-1939-45c6-80b6-29715fc864afn%40googlegroups.com.

Alan Karp

unread,
Apr 24, 2026, 7:33:47 PM (19 hours ago) Apr 24
to fr...@googlegroups.com
On Fri, Apr 24, 2026 at 4:16 PM Matt Rice <rat...@gmail.com> wrote:
I suppose I am a little bit confused, as Norm's original confused
deputy paper doesn't use path rewriting at all,
the paths were apparently absolute (although using a different syntax,
or maybe using shell substitutions I'm not sure which).
So it feels like once we've fixed these path normalization problems we
still have underlying issues to resolve.

That's because he didn't need that complication to illustrate the problem.  Alice said compile(a.c, log.txt) even though she lacked permission to write log.txt.  A more realistic example would require tests, such as path normalization, to prevent the attack.

One question going back to Alan's original email he asked "Or do
capabilities give us something better?" to which I would ask
what kind of capabilities?

I'm hoping any kind.  My proposed solution says nothing about the type of capability.
 
The post seems to be LLM centric and using
data channels to transmit capabilities we are limited to password
capabilities. 

I certainly didn't intend to be LLM centric.
 
--------------
Alan Karp

Matt Rice

unread,
Apr 24, 2026, 10:37:49 PM (16 hours ago) Apr 24
to fr...@googlegroups.com
On Fri, Apr 24, 2026 at 11:33 PM Alan Karp <alan...@gmail.com> wrote:
>
> On Fri, Apr 24, 2026 at 4:16 PM Matt Rice <rat...@gmail.com> wrote:
>>
>> I suppose I am a little bit confused, as Norm's original confused
>> deputy paper doesn't use path rewriting at all,
>> the paths were apparently absolute (although using a different syntax,
>> or maybe using shell substitutions I'm not sure which).
>> So it feels like once we've fixed these path normalization problems we
>> still have underlying issues to resolve.
>
>
> That's because he didn't need that complication to illustrate the problem. Alice said compile(a.c, log.txt) even though she lacked permission to write log.txt. A more realistic example would require tests, such as path normalization, to prevent the attack.
>>

I personally don't find it particularly appetizing, adding a security
model on top of the failed file system access control,
still giving the compiler/deputy an identity with authority, but
attempting to avoid using through path comparisons.

The problem isn't that alice lacked permission, the deputy had
permission despite the fact that it didn't need it to use it on behalf
of alice.
It is easy to prove that the deputy cannot mix up permissions it does
not have, and another thing entirely to prove that tests such as path
normalization are complete in not using that authority on behalf of
alice.

I suppose this whole approach to the solution isn't my cup of tea.

>>
>> One question going back to Alan's original email he asked "Or do
>> capabilities give us something better?" to which I would ask
>> what kind of capabilities?
>
>
> I'm hoping any kind. My proposed solution says nothing about the type of capability.
>
>>
>> The post seems to be LLM centric and using
>> data channels to transmit capabilities we are limited to password
>> capabilities.
>
>
> I certainly didn't intend to be LLM centric.
>

Yeah I meant the article you had linked to...

Alan Karp

unread,
Apr 24, 2026, 11:32:37 PM (15 hours ago) Apr 24
to fr...@googlegroups.com
On Fri, Apr 24, 2026 at 7:37 PM Matt Rice <rat...@gmail.com> wrote:

The problem isn't that alice lacked permission, the deputy had
permission despite the fact that it didn't need it to use it on behalf
of alice.
But the deputy does need to use the log file on behalf of Alice.  Otherwise she couldn't be billed.  
The important factor is who designates the resource.  If the deputy designates it, no problem. 
If someone else does, you've got a vulnerability.
 
It is easy to prove that the deputy cannot mix up permissions it does
not have, and another thing entirely to prove that tests such as path
normalization are complete in not using that authority on behalf of
alice.

Path normalization becomes a factor when you're designating files by strings.
It doesn't enter the picture if you designate by specifying a capability.  That's
why I think my proposed solution works.

--------------
Alan Karp

Matt Rice

unread,
2:36 AM (12 hours ago) 2:36 AM
to fr...@googlegroups.com
On Sat, Apr 25, 2026 at 3:32 AM Alan Karp <alan...@gmail.com> wrote:
>
> On Fri, Apr 24, 2026 at 7:37 PM Matt Rice <rat...@gmail.com> wrote:
>>
>>
>> The problem isn't that alice lacked permission, the deputy had
>> permission despite the fact that it didn't need it to use it on behalf
>> of alice.
>
> But the deputy does need to use the log file on behalf of Alice. Otherwise she couldn't be billed.
> The important factor is who designates the resource. If the deputy designates it, no problem.
> If someone else does, you've got a vulnerability.
>

I oversimplified...

At least to me, the interesting thing about the way Norm solved the
problem is that the deputy cannot
really misbehave, it doesn't have the log capabilities when invoked by
alice, it has a closure which can invoke
the log, and in the same way as the *-property papers, has distinct
capability channels and data channels,
with no way to transmit the log capabilities between alice and the
closure containing the log cap.

If the deputy designates it, you're one logic error away from the
deputy becoming confused.
By using the keykos factory style, the deputy is more of an impartial
arbiter between two mutually suspicious parties,
where if both parties trust the factory the instance of the deputy
created cannot transfer capabilities in either direction.

It's the difference between being able to write something which is
properly secure, and the inability to write something which
behaves insecurely/improperly. I don't see how you get anything
resembling the latter with the involvement of a singularly rooted
filesystem.

>>
>> It is easy to prove that the deputy cannot mix up permissions it does
>> not have, and another thing entirely to prove that tests such as path
>> normalization are complete in not using that authority on behalf of
>> alice.
>
>
> Path normalization becomes a factor when you're designating files by strings.
> It doesn't enter the picture if you designate by specifying a capability. That's
> why I think my proposed solution works.
>
> --------------
> Alan Karp
>
> --
> You received this message because you are subscribed to the Google Groups "friam" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to friam+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/friam/CANpA1Z1753BeztMGcbwK3EKknkn1SAhcyOFB3Y_YiUzjasUrWw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages