use cases for distinguishing between read access and existence-checking access

6 views
Skip to first unread message

David Nicol

unread,
May 25, 2021, 5:46:33 PM5/25/21
to cap-...@googlegroups.com

Disclaimer: I do not know how Tahoe-LAFS does this, and I assume it does it right.

That said, I'm thinking about encrypted file system semantics, and I have concluded that it would be elegant for the (mutable) file identifier to work as the shared key for encrypting all the pages in it.

This means that if you know the file identifier, you can read the file.

It also means that if the system wants to provide a view of the file system without read access, a shadow directory would have to be composed that only indicates existence and size but does not include the file identifier.

Because separating the identifier (which refers to a list of pages) and read access privilege would get really complicated.

That brings us to the discussion question I would like to put to this group:

What real world use cases are there where it makes sense for someone to have rights to see if a file exists in a file system, but not have rights to read the file? I know this is a basic part of Unix security semantics, but is it really needed?

Thank you

David "fourth cup of coffee" Nicol 

--
"Lay off that whiskey, and let that cocaine be!" -- Johnny Cash

Kevin Reid

unread,
May 25, 2021, 11:27:42 PM5/25/21
to cap-...@googlegroups.com
On Tue, May 25, 2021 at 2:46 PM David Nicol <david...@gmail.com> wrote:
What real world use cases are there where it makes sense for someone to have rights to see if a file exists in a file system, but not have rights to read the file? I know this is a basic part of Unix security semantics, but is it really needed?

Listing directories (and file metadata) but not reading their contents is the least authority `ls -R`, `find`, `du`, and such need to do their jobs.

This is not a sufficient argument to include the feature in the filesystem itself, since one might reasonably have a simple tree-walker utility (analogous to `find`) that promises to not read files, and grant that read access and send its output to the next step of formatting/analysis. However, I think it's a sufficient example to demonstrate that this is a meaningful authority subset.

Beyond "least authority!!!", here's some specific properties/scenarios:

If I have a report produced by a program of the above sort, and I accidentally leak it, then I know I don't need to worry specifically about any information that's stored in the contents of files rather than their names.

Possibly just a refinement of the above: Suppose I'm a company providing services. I might decide to run some sort of auditing tool to detect extra or missing files in my storage ("Hmm, this directory totals a terabyte big, are we forgetting to ever delete anything from it?" "Is this backup complete, at least in file counts and sizes?") but I don't want the tool to ever have any bytes of user/customer data in its memory. As long as I don't put any user data in file names (which is a debatable engineering constraint on its own!) I get this guarantee if the filesystem offers a read-directories-only mode.

Tony Arcieri

unread,
May 26, 2021, 9:27:50 AM5/26/21
to cap-...@googlegroups.com
On Tue, May 25, 2021 at 2:46 PM David Nicol <david...@gmail.com> wrote:
Disclaimer: I do not know how Tahoe-LAFS does this, and I assume it does it right. [...] It also means that if the system wants to provide a view of the file system without read access, a shadow directory would have to be composed that only indicates existence and size but does not include the file identifier. [...]

That brings us to the discussion question I would like to put to this group:

What real world use cases are there where it makes sense for someone to have rights to see if a file exists in a file system, but not have rights to read the file? I know this is a basic part of Unix security semantics, but is it really needed?

Have you looked at Tahoe-LAFS's verifycaps?

They can be used to check signatures, and therefore verify files (including mutable files) are stored correctly and potentially do things like repair/rebalance them, but they lack the decryption key component, so all of these operations only act over ciphertexts which can't be read without the decryption key.

--
Tony Arcieri

David Nicol

unread,
May 26, 2021, 11:00:52 AM5/26/21
to cap-...@googlegroups.com
thanks all for responses. On further thought, and pondering the sealer/unsealer pattern, I've concluded that all the data needs to have its own secret key, possibly at page granularity if it's implemented in a FS. 
Directory listings may contain the unsealers, or not, to provide a metadata-only view and a read view, while allowing distribution of storage to untrusted storage resources (which is why trustable tools won't work -- there's no enforceable perimeter to raw disk view, in a distributed storage scheme involving untrusted parties.) 

Tony Arcieri

unread,
May 26, 2021, 11:17:33 AM5/26/21
to cap-...@googlegroups.com
On Wed, May 26, 2021 at 8:00 AM David Nicol <david...@gmail.com> wrote:
Directory listings may contain the unsealers, or not, to provide a metadata-only view and a read view, while allowing distribution of storage to untrusted storage resources 

This was (is?) a bit of an unsolved in Tahoe-LAFS: while there are writecaps/readcaps for directories which each contain the corresponding writecaps/readcaps for their contents, there was no corresponding notion for verifycaps.

If there were, it would allow an untrusted storage node/agent to e.g. periodically/continuously run a repair operation over a directory structure and ensure that there are a "healthy" number of shares for each of the files/subdirectories.

This was described as a "deep verifycap": https://tahoe-lafs.org/trac/tahoe-lafs/ticket/308
 
--
Tony Arcieri

David Nicol

unread,
May 26, 2021, 11:42:59 AM5/26/21
to cap-...@googlegroups.com
Absolutely what I was seeking, Thank you Tony Arcieri. Please invoice me directly for your next beer and/or cheeseburger.

On Wed, May 26, 2021 at 10:17 AM Tony Arcieri <bas...@gmail.com> wrote:

This was described as a "deep verifycap": https://tahoe-lafs.org/trac/tahoe-lafs/ticket/308

--

Bill Frantz

unread,
May 26, 2021, 1:10:01 PM5/26/21
to cap-...@googlegroups.com
On 5/25/21 at 11:27 PM, kpr...@switchb.org (Kevin Reid) wrote:

>As long as I don't
>put any user data in file names (which is a debatable engineering
>constraint on its own!) I get this guarantee if the filesystem offers a
>read-directories-only mode.

Anyone who looked at my photo directory could get a reasonable
idea of where I had been since there is a directory for each day
with the location as part of the directory name. If I use a
photo, after editing it will be renames to include the names of
the people etc.

I suppose I could trust a photo application to hide this
information in the metadata, but I like the well-tested
robustness, and relatively future proof aspects of the file system.

Cheers - Bill

-----------------------------------------------------------------------
Bill Frantz | Since the IBM Selectric, keyboards have gotten
408-348-7900 | steadily worse. Now we have touchscreen keyboards.
www.pwpconsult.com | Can we make something even worse?

David Nicol

unread,
May 26, 2021, 1:50:15 PM5/26/21
to cap-...@googlegroups.com



file names are not needed for a traverse/verify view, just traverse/verify caps

write cap: replace a mutable to point to the replacement immutable root of some data

traverse cap: contains immutable pointers to data blocks including traverse-only directory data

a traverse cap is essentially a directory structure without either names or reading secrets

"verify" features imply data blocks are signed after getting encrypted, so a verify capability would be whatever it takes to verify the signatures on encrypted blocks -- which is redundant for a DHT because immutables are stored indexed by their hashes, and changing a mutable requires verifying authority.

read cap: traverse cap (plus verify?) plus reading secrets

The goal of all this thinking out loud is (of course) a LAFS that replicates in a public DHT.



--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cap-talk/r480Ps-10146i-328B63F8BA7A447AA760157DEADC24E1%40Williams-MacBook-Pro.local.

Raoul Duke

unread,
May 26, 2021, 2:36:34 PM5/26/21
to cap-...@googlegroups.com
(would uuids be better than string names in some cases as a middle ground "identifier"?)

David Nicol

unread,
May 26, 2021, 6:13:13 PM5/26/21
to cap-...@googlegroups.com


On Wed, May 26, 2021 at 1:36 PM Raoul Duke <rao...@gmail.com> suggested: 
(would uuids be better than string names in some cases as a middle ground "identifier"?) 



I think not, with a content-addressable DHT, because the hash keys already function as universally unique identifiers:

  • every immutable datum's hash is its own self-proving UUID; also
  • mutables are keyed by a hash of a public key plus more stuff to identify the particular mutable, which data must have a signature verifiable by the public key in order to verify and not reject the create/update messages.

That is:
  1. all immutables have a fixed UUID of their hash, which gives deduplication, and the hashes are long enough to make worrying about collisions silly.
  2. mutables have a fixed UUID of the hash of (public key, salt) where salt is an arbitrary string used to differentiate multiple mutables that can be changed by the holder of the private key.
  3. updates to mutables are only honored when the update message includes (public key, salt, data) and the message has a signature that can be verified with the provided public key.
capability wise, the UUID represents a read capability for the data, either mutable or immutable.

My question is about designing a LAFS file system that will use a DHT for storage, without revealing anything other than public keys used for updating the mutables, but even then there isn't a way to inspect encrypted data in the mutables without a shared secret. Encrypted directory data doesn't tell anything about its structure, or even that it is directory data.

It isn't clear that a traverse/verify view would even be needed: Mainline DHT expires blocks after two hours, so whoever wants data to persist needs to refresh it periodically, effectively resulting in mark-and-sweep garbage collection.

Raoul Duke

unread,
May 26, 2021, 6:58:49 PM5/26/21
to cap-...@googlegroups.com
i have never liked the idea of even theoretically remotely possible hash collisions :-)

/murphy'slaw

David Nicol

unread,
May 26, 2021, 10:24:10 PM5/26/21
to cap-...@googlegroups.com
Do you carry a steel roof wherever you go because of meteors?

On Wed, May 26, 2021, 17:58 Raoul Duke <rao...@gmail.com> wrote:
i have never liked the idea of even theoretically remotely possible hash collisions :-)

/murphy'slaw

--
You received this message because you are subscribed to the Google Groups "cap-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cap-talk+u...@googlegroups.com.

Jonathan S. Shapiro

unread,
May 27, 2021, 9:40:59 AM5/27/21
to cap-...@googlegroups.com
Anyone who looked at my photo directory could get a reasonable
idea of where I had been since there is a directory for each day
with the location as part of the directory name.

Probably so. And it’s a valid privacy concern.

It does not seem to me that any permission system is likely to overcome heavy investment in poor application design…

To put it less cynically: there’s a trade off between allowing some access vs disabling human utility. That’s not a fault of the permission system. Unfortunately our intuitions about what is exposed by “some” access tend to be quite bad.

There are days when I wonder whether the correct policy isn’t “everything is readable”, on the theory that any other setting misleads the user about what’s actually protectable... 

Jonathan

Jonathan S. Shapiro

unread,
May 27, 2021, 10:14:19 AM5/27/21
to cap-...@googlegroups.com
On Wed, May 26, 2021 at 11:36 AM Raoul Duke <rao...@gmail.com> wrote:
(would uuids be better than string names in some cases as a middle ground "identifier"?)

UUIDs have problems. The two common forms give you a Hobson’s choice between being very guessable (and more so on VMs) or very collidable.

The problem with the birthday paradox isn’t the likelihood. It’s that designers tend to assume that it never happens and then architect that assumption deeply into their systems without  So when it does happen… And also, they are data, so one needs to take care about replay attacks if they are used to govern access.

The available sources of randomness on most computers are astonishingly bad.

One of the interesting decisions I’ve observed in some Amazon code recently is that they accept client-generated UUIDs for object iDs in a particular JSON doc, but those UUIDs are “scoped” to the graph created by the client - any UUID that has meaning across documents is generated on the servers.

It then surprised me that they used random UUIDs on the client given they didn’t need to be opaque. Though given how easy it is to set up virtual machines with colliding MAC addresses that’s an understandable default. 

Jonathan 
Reply all
Reply to author
Forward
0 new messages