OCI Bundle Digests Summary

221 views
Skip to first unread message

Brandon Philips

unread,
Oct 14, 2015, 1:09:26 PM10/14/15
to dev
Hello-

I am working on a document summarizing our needs and basic design requirements for our digest format based on previous conversations. I am working on it here: https://docs.google.com/document/d/1QA0W6q8BxYPTUw4c_XfYF1CrHOvE3oFfuyR1U8ujaKs/edit

Reproduced here below too.

Cheers,

Brandon


A bundle is designed to be moved between hosts. Although OCI doesn't define a transport method we should have a cryptographic digest of the on-disk bundle that can be used to verify that a bundle is not corrupted and in an expected configuration.


Optionally this digest could then be used in a cryptographic public-key signature system.


Now, the tricky bits are defining what parts of a file system are verified.


Potential Use Cases


  • Verification of a filesystem on-disk after moving bundle between hosts (everyone)

  • Enabling optional cryptographic public-key signature systems (everyone)

  • Unpacking/packing of file systems as a non-root user (Joe Beda)


Non-goals


  • Define the "one true digest"; it is inevitable that a container will have multiple digests calculated over it.

    • a filesystem or raid system may calculate a crc

    • a distribution method may calculate a digest with the archive format

  • Define a signing system; we will assume that the digest will be a short string of bytes that can be signed by different systems


Goals


  • Define a bundle digest that can confirm the on-disk integrity of a bundle after moving between hosts

  • Make this digest fast to compute and parallelizable (merkle trees)

  • Use cryptographic best practices for hashes based on 2015

  • Design for upgradeability and versioning of the digest

    • e.g. prefix with sha512-<digest> or similar


Potential Goals


  • Enable packing/unpacking as non-root users


File system Metadata to Calculate


Problem: Should we support all possible filesystem properties such as extended attributes, POSIX ACLs, NFSv4 ACLs, modification timestamps, etc?


There is a case for trying to support everything from day zero but there weren't any solid use cases to emerge for trying to support everything.


Having read through and reflected on the discussion so far I believe that we should design a digest format that may support those in the future but does not attempt to encode all of those things for now. In the digest serialization format we can then provide reasonable defaults that will error if a file has xattrs, acls, or non-default timestamps.


  • xattrs must be empty on-disk for now

  • posix acls must be empty on-disk for now

  • modification/access time will be ignored on-disk if empty in manifest


File system Serialization


Problem: How should we serialize the file system data?


There are a few approaches to this problem:


Serialize file system metadata and contents into a byte stream

  • Used in appc and docker today with tar

  • Approach is well understood and has existing tooling

  • Changing one file or one piece of metadata requires significant recalculation

  • Format are out of our control and difficult to extend


Serialize file system metadata and content digest into a byte stream

  • Prototyped in systems like https://github.com/stevvooe/continuity

  • Similar concept used in bittorrent

  • Recalculation of metadata is trivial and easy for single changes

  • Extensible upgrade path under our control


I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy


The basic concept is:


digest([]filedata{})


struct filedata {

name string

filetype enum

uid int

gid int

etc…

}


Joe Beda

unread,
Oct 14, 2015, 1:18:08 PM10/14/15
to Brandon Philips, dev
Thanks for getting this discussion started, Brandon.  To be clear my thinking for bringing up packing/unpacking for non-root users is to allow a wide tool set to process/update/verify/munge these things without having to have root on a machine.  The more of this world we can move out of root, the better we are going to be.

One thing that didn't show up as either a goal or non-goal is "bundle composition".  This includes both docker style layering but also more complex composition where a bundle could have multiple "parents".  The obvious problem here is that this starts to drag in naming of bundles and that is something that is also missing from your proposal.  

I assume that eventually (optionally) we'd want to be able to break bundles down into smaller chunks to enable sharing and minimize the amount of transport data. While that doesn't have to fall out at this layer, we might want to have an idea of how that might be done building on this bundle concept.

Joe

To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.

W. Trevor King

unread,
Oct 14, 2015, 2:15:00 PM10/14/15
to Brandon Philips, dev
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> Design for upgradeability and versioning of the digest
> -
>
> e.g. prefix with sha512-<digest> or similar

IPFS uses multihashes for this [1]. They currently bump into some
ambiguity because the encoding (e.g. hex, base58, …) isn't *also*
clearly represented in encoded versions of the multihash. But you
could either keep the multihash in a binary byte stream, or add an
additional prefix character when to encoding it (e.g. x11… for 11…
being hex encoded, or 8Qm… for Qm… being base58).

And IPFS obviously has a handle on Merkle-trees for quick re-hashing
and efficient distribution over untrusted channels [2]. That handles
Joe's “we'd want to be able to break bundles down into smaller chunks
to enable sharing and minimize the amount of transport data” [3] with
IPFS using (sub)-file level chunks in it's Merkle tree (for an example
of sub-file blocks, see the “File blocks” section of [4] and code in
[5]). Folks interested in having unsigned bundle components probably
don't want to use IPFS for verification, but may still be intersted in
using it (with untrusted hashes) to efficiently distribute content
that they intend to verify using another method (e.g. continuity
digest [6]).

Cheers,
Trevor

[1]: https://github.com/jbenet/multihash
[2]: https://github.com/ipfs/specs/tree/master/merkledag
[3]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/bigtA9lKCAAJ
Message-ID: <CAH0Nu3czh1Y0Q9N0N5s7R6O61cpoH=1p5cpfir2oPDv=zYs...@mail.gmail.com>
[4]: http://ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsNTn2aDxdXBFca7D/example#/ipfs/QmThrNbvLj7afQZhxH72m5Nn1qiVn3eMKWFYV49Zp2mv9B/graphmd/README.md
https://github.com/ipfs/examples/tree/master/examples/graphmd
[5]: https://github.com/ipfs/go-ipfs/tree/v0.3.7/importer
Currently has fixed-size [7] and Rabin [8] splitters to convert a
large binary object into blocks, and ‘trickle’ and ‘balanced’
algorithms to organize those blocks in a tree.
[6]: https://github.com/stevvooe/continuity
[7]: https://github.com/ipfs/go-ipfs/blob/d4938c9b0cd57c52c71ec83998a8288c3247826c/importer/chunk/splitting.go
[8]: https://github.com/ipfs/go-ipfs/blob/d4938c9b0cd57c52c71ec83998a8288c3247826c/importer/chunk/rabin.go

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
signature.asc

W. Trevor King

unread,
Oct 14, 2015, 2:33:58 PM10/14/15
to Brandon Philips, dev
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> A bundle is designed to be moved between hosts. Although OCI doesn't
> define a transport method we should have a cryptographic digest of
> the on-disk bundle that can be used to verify that a bundle is not
> corrupted and in an expected configuration.
>
> Optionally this digest could then be used in a cryptographic public-key
> signature system.

I'm -1 to including this in the runtime spec, but agnostic to
including it in a higher-level, OCI managed layer [1].

> File system Serialization
>
> Problem: How should we serialize the file system data?

If we're not talking about transport methods here, I don't see a need
to talk about filesystem serialization independent of digests. Folks
can (de)serialize with a tool of their choice (tar, IPFS, …) and then
use the serialization-agnostic digest-checker to verify the on-disk
bundle.

If you want to support non-root users verifying serialized bundles
(because they don't have permission to unpack them on the host), you
can teach the digest-checker to walk the serialized format. If the
user can't update the digest-checker, they may be able to run the
stock digest-checker in a user-mapped namespace where they do have
permissions to create files with (mapped) UID 0, etc.

Are there other reasons that we need to address filesystem
serialization in the same layer that handled digest-generation and
checking?

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/1T0z1IJWxw8/XKdv69e1FgAJ
Subject: Re: Hashing and verifying a bundle
Message-ID: <20150902184...@odin.tremily.us>
signature.asc

Jason Bouzane

unread,
Oct 14, 2015, 8:27:28 PM10/14/15
to Brandon Philips, dev
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:
Hmm, why would atime be included in the metadata? While it can be set, it's unlikely to stay the same for long except on readonly or noatime filesystems. Is there a use case for preserving atime?

 

File system Serialization


Problem: How should we serialize the file system data?


There are a few approaches to this problem:


Serialize file system metadata and contents into a byte stream

  • Used in appc and docker today with tar

  • Approach is well understood and has existing tooling

  • Changing one file or one piece of metadata requires significant recalculation

  • Format are out of our control and difficult to extend


Serialize file system metadata and content digest into a byte stream

  • Prototyped in systems like https://github.com/stevvooe/continuity

  • Similar concept used in bittorrent

  • Recalculation of metadata is trivial and easy for single changes

  • Extensible upgrade path under our control


I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy


We use the continuity approach within Google rather extensively, and it has some additional benefits, too. It allows for partial verification without having to read the entire contents of the bundle. E.g. if I wish to verify a particular file, I can hash it, check that the digest matches what's in the metadata and then recompute the metadata digest, and then verify the signature attached to it. If the signed digest is over the entire contents of the bundle instead of over the digests of the included files, then I need to read all the files on the filesystem to recompute the digest which is far more expensive if all I care about is verifying a part of that filesystem.

Partial verification becomes especially important if you want to create a single bundle that is multiarchitecture or multiplatform where some parts of the bundle might be omitted depending on which architecture or platform is using the bundle. I don't see multi-arch or multi-platform bundles listed in your goals or your non-goals section, but I also don't see any reason to preclude them in the future even if we don't need them now.
 

The basic concept is:


digest([]filedata{})


struct filedata {

name string

filetype enum

uid int

gid int

etc…

}


Serializing these data so that you can generate a digest and cryptographically sign it can be somewhat tricky since the representation should have a single canonical form that is unambiguous. A proto message, while unambiguous, can have multiple equivalent serialized representations. Thus, one must be careful if the idea is to use protos never to deserialize the data then serialize it again before verifying the signature. This reserialization may result in a passing signature check, but only if both proto implementations agree on the order and representation of all the fields. Proto is not unique in this. Tar, for example, also provides for multiple representations of the same data.

Brandon Philips

unread,
Oct 14, 2015, 10:37:35 PM10/14/15
to W. Trevor King, dev
On Wed, Oct 14, 2015 at 2:14 PM W. Trevor King <wk...@tremily.us> wrote:
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
>    Design for upgradeability and versioning of the digest
>    -
>
>       e.g. prefix with sha512-<digest> or similar

IPFS uses multihashes for this [1].  They currently bump into some
ambiguity because the encoding (e.g. hex, base58, …) isn't *also*
clearly represented in encoded versions of the multihash.  But you
could either keep the multihash in a binary byte stream, or add an
additional prefix character when to encoding it (e.g. x11… for 11…
being hex encoded, or 8Qm… for Qm… being base58).

I don't like that multihashes are not self describing nor human understandable. You have to know you are dealing with some "magic" multihash namespace. Plus, the multihash has no way to deal with merkle tree objects.

Brandon

Brandon Philips

unread,
Oct 14, 2015, 10:39:19 PM10/14/15
to W. Trevor King, dev
On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King <wk...@tremily.us> wrote:
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> A bundle is designed to be moved between hosts. Although OCI doesn't
> define a transport method we should have a cryptographic digest of
> the on-disk bundle that can be used to verify that a bundle is not
> corrupted and in an expected configuration.
>
> Optionally this digest could then be used in a cryptographic public-key
> signature system.

I'm -1 to including this in the runtime spec, but agnostic to
including it in a higher-level, OCI managed layer [1].

I am just trying to paint the picture for what the use case is for the digest.

Brandon 

Brandon Philips

unread,
Oct 14, 2015, 10:40:44 PM10/14/15
to W. Trevor King, dev
On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King <wk...@tremily.us> wrote:
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> File system Serialization
>
> Problem: How should we serialize the file system data?

If we're not talking about transport methods here, I don't see a need
to talk about filesystem serialization independent of digests.  Folks
can (de)serialize with a tool of their choice (tar, IPFS, …) and then
use the serialization-agnostic digest-checker to verify the on-disk
bundle.

What? How do you hash something without serializing it?
 
Are there other reasons that we need to address filesystem
serialization in the same layer that handled digest-generation and
checking?

You need something serialized into a string of bytes in order to generate a digest.

Brandon 

Solomon Hykes

unread,
Oct 14, 2015, 10:44:39 PM10/14/15
to Brandon Philips, W. Trevor King, dev
On Wed, Oct 14, 2015 at 7:40 PM, Brandon Philips <brandon...@coreos.com> wrote:
On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King <wk...@tremily.us> wrote:
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> File system Serialization
>
> Problem: How should we serialize the file system data?

If we're not talking about transport methods here, I don't see a need
to talk about filesystem serialization independent of digests.  Folks
can (de)serialize with a tool of their choice (tar, IPFS, …) and then
use the serialization-agnostic digest-checker to verify the on-disk
bundle.

What? How do you hash something without serializing it?

I think Trevor is saying that different implementations (in his example, tar archives vs. IPFS encoding) might use different serialization methods, which will yield different byte strings from the same input bundle.

Brandon Philips

unread,
Oct 14, 2015, 10:44:56 PM10/14/15
to Jason Bouzane, dev
On Wed, Oct 14, 2015 at 8:27 PM Jason Bouzane <jbou...@google.com> wrote:
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:
  • xattrs must be empty on-disk for now

  • posix acls must be empty on-disk for now

  • modification/access time will be ignored on-disk if empty in manifest


Hmm, why would atime be included in the metadata? While it can be set, it's unlikely to stay the same for long except on readonly or noatime filesystems. Is there a use case for preserving atime?

Nope, I don't have a use case. Just pointing out filesystem properties to ignore and handle. 
 
Brandon

Brandon Philips

unread,
Oct 14, 2015, 10:50:54 PM10/14/15
to Jason Bouzane, dev
On Wed, Oct 14, 2015 at 8:27 PM Jason Bouzane <jbou...@google.com> wrote:
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:
I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy

We use the continuity approach within Google rather extensively, and it has some additional benefits, too. It allows for partial verification without having to read the entire contents of the bundle. E.g. if I wish to verify a particular file, I can hash it, check that the digest matches what's in the metadata and then recompute the metadata digest, and then verify the signature attached to it. If the signed digest is over the entire contents of the bundle instead of over the digests of the included files, then I need to read all the files on the filesystem to recompute the digest which is far more expensive if all I care about is verifying a part of that filesystem.

Yes! Good explanation; I will fold that into the doc.

Brandon

Brandon Philips

unread,
Oct 14, 2015, 10:57:01 PM10/14/15
to Jason Bouzane, dev
On Wed, Oct 14, 2015 at 8:27 PM Jason Bouzane <jbou...@google.com> wrote:
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:
I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy

Partial verification becomes especially important if you want to create a single bundle that is multiarchitecture or multiplatform where some parts of the bundle might be omitted depending on which architecture or platform is using the bundle. I don't see multi-arch or multi-platform bundles listed in your goals or your non-goals section, but I also don't see any reason to preclude them in the future even if we don't need them now.

I see what you are saying about omitting or adding dependent stuff but a bundle and this digest is only covering the on-disk representation. So, I don't know how we could reasonably tackle this with how bundles are described today. The bundle on-disk representation is "ready to run" and thus if you want to de-duplicate some files that is done at a different transport/caching layer.

I think that doing this while interesting is largely orthogonal to our concerns here. As a reference there is some multi-platform discussion here: https://github.com/opencontainers/specs/issues/73

Cheers,

Brandon

Brandon Philips

unread,
Oct 14, 2015, 10:59:00 PM10/14/15
to Jason Bouzane, dev
On Wed, Oct 14, 2015 at 8:27 PM Jason Bouzane <jbou...@google.com> wrote:
Really?! I didn't know this about protos. Is there nothing in proto3 that gives me reproducible protos?

Thank You,

Brandon

Jason Bouzane

unread,
Oct 14, 2015, 11:22:45 PM10/14/15
to Brandon Philips, dev
The on-disk representation is ready to run, yes, but the metadata and
the signature data do limit what we can do at the transport, caching,
and naming layers, so I would argue the concerns aren't completely
orthogonal. My point here is that our choice of what we use to
generate the digest has implications for these other layers, and that
hashing a digest of the files provides more flexibility at these other
layers.

Brandon Philips

unread,
Oct 14, 2015, 11:36:43 PM10/14/15
to Jason Bouzane, dev
So, you are saying that the "filesystem proto" providing a digest of each file is sufficient in your mind?

One non-goal that is implied with that is not providing some chunked tree scheme so we can assemble a files contents from matching chunks that may be cached from the transport layer.

Brandon

Jason Bouzane

unread,
Oct 14, 2015, 11:39:10 PM10/14/15
to Brandon Philips, dev
On Wed, Oct 14, 2015 at 7:58 PM, Brandon Philips
<brandon...@coreos.com> wrote:
> Really?! I didn't know this about protos. Is there nothing in proto3 that
> gives me reproducible protos?

Proto3 does not help you. There are at least two places in proto3 that
allow equivalent messages to differ in their serialized form. One is
field order. While the proto3 specification recommends that fields be
written in numerical order, this is not required, and it explicitly
requires parsers to deal with fields out of order. The second is that
packed repeated fields may be specified any number of times and they
are to be concatenated. While the specification recommends against
encoding more than one packed repeated field for a particular tag
number in a message, it does require that parsers deal with this
situation correctly.

Because of these two parts of the spec, two serialized proto messages
can be concatenated together to produce another correct proto message,
assuming that no non-repeated field is specified in both of the
messages concatenated together.

In any case, the upshot of this is that while a particular
implementation of the proto library may deterministically produce the
same serialized proto every time when given a particular proto
message, there's no guarantee that two different proto libraries will
serialize it in the same way, nor are there any guarantees that any
particular proto library serializer will be stable over time. While I
doubt that any official Google implementation would ever change the
serialization, third party implementations may do whatever they like.
For example, some serializers may choose to output the fields in hash
order instead of ascending order, and that could even make the
serialization non-deterministic between invocations of the program.

Brandon Philips

unread,
Oct 14, 2015, 11:41:34 PM10/14/15
to Jason Bouzane, dev
Any recommendations on what to use then?

Brandon 

W. Trevor King

unread,
Oct 15, 2015, 12:21:11 AM10/15/15
to Solomon Hykes, Brandon Philips, dev
On Wed, Oct 14, 2015 at 07:44:39PM -0700, Solomon Hykes wrote:
> On Wed, Oct 14, 2015 at 7:40 PM, Brandon Philips wrote:
> > On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King wrote:
> >> On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> >> > File system Serialization
> >> >
> >> > Problem: How should we serialize the file system data?
> >>
> >> If we're not talking about transport methods here, I don't see a need
> >> to talk about filesystem serialization independent of digests. Folks
> >> can (de)serialize with a tool of their choice (tar, IPFS, …) and then
> >> use the serialization-agnostic digest-checker to verify the on-disk
> >> bundle.
> >
> > What? How do you hash something without serializing it?
>
> I think Trevor is saying that different implementations (in his
> example, tar archives vs. IPFS encoding) might use different
> serialization methods, which will yield different byte strings from
> the same input bundle.

No, I was saying that there's no need to serialize the whole
filesystem at once if you're just building a digest. For example:

$ sha1sum a b c/d c/e >digest
$ cat digest
553a4fd57546a306c4648ec4b39e95560c92c929 a
9a7e12f53275ed96bc04d2768b04fda46908a4bd b
dae8eb226ca659a88d4c84cb89187320e86449c1 c/d
71be3027b16b63fdfaa7042eb2e7c0aa44acbe2d c/e
$ sha1sum -c digest
a: OK
b: OK
c/d: OK
c/e: OK

So creating a digest for a whole bundle just needs a way to serialize
the metadata of one file (owner, group, permissions, mode, …) into a
byte stream. That's a lot easier than figuring out how to pack the
whole filesystem into a single, reproducible byte stream.

Cheers,
Trevor
signature.asc

W. Trevor King

unread,
Oct 15, 2015, 12:26:59 AM10/15/15
to Brandon Philips, dev
On Thu, Oct 15, 2015 at 02:37:24AM +0000, Brandon Philips wrote:
> On Wed, Oct 14, 2015 at 2:14 PM W. Trevor King wrote:
> > On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> > > Design for upgradeability and versioning of the digest
> > > -
> > >
> > > e.g. prefix with sha512-<digest> or similar
> >
> > IPFS uses multihashes for this [1]. They currently bump into some
> > ambiguity because the encoding (e.g. hex, base58, …) isn't *also*
> > clearly represented in encoded versions of the multihash. But you
> > could either keep the multihash in a binary byte stream, or add an
> > additional prefix character when to encoding it (e.g. x11… for 11…
> > being hex encoded, or 8Qm… for Qm… being base58).
>
> I don't like that multihashes are not self describing nor human
> understandable. You have to know you are dealing with some "magic"
> multihash namespace.

You'd just declare that in your digest-tool's docs. The digest file
doesn't have to be human-readable or anything. If you prefer
human-readable prefixes, that's fine too. I'd be surprised if there
aren't existing packages for generating and parsing those as well.

> Plus, the multihash has no way to deal with merkle tree objects.

I'm not sure what you mean here. IPFS uses multihashes for its merkle
objects. And having hash and size prefix bytes vs. a “sha512-” prefix
doesn't seem like a theoretical difference (it's just a practical “we
already know an off-the-shelf libraries for this” difference).

Cheers,
Trevor
signature.asc

Jason Bouzane

unread,
Oct 15, 2015, 12:33:01 AM10/15/15
to Brandon Philips, dev
On Wed, Oct 14, 2015 at 8:41 PM, Brandon Philips
Actually, I think protocol buffers are the right way to go. However,
we just have to be careful about how we do it. Essentially, the rule
that you have to follow is that any time a proto message (or the
digest thereof) is signed, the thing included with the signature is
the same serialized proto that was signed. That way, you can check the
signature on the serialized proto without having to reserialize it.

So, e.g., any time a signature appears, it should be more or less like this:

message SignedData {
// Serialized message
bytes signed_message = 1;
bytes signature = 2;
bytes certificate = 3;
...
}

We'd never do anything like the following:

message Metadata {}

message SignedMetadata {
// Embedded message
Metadata metadata = 1;
bytes signature = 2;
bytes certificate = 3;
...
}

W. Trevor King

unread,
Oct 15, 2015, 1:31:38 AM10/15/15
to Jason Bouzane, Brandon Philips, dev
On Wed, Oct 14, 2015 at 09:32:59PM -0700, 'Jason Bouzane' via dev wrote:
> Actually, I think protocol buffers are the right way to go. However,
> we just have to be careful about how we do it. Essentially, the rule
> that you have to follow is that any time a proto message (or the
> digest thereof) is signed, the thing included with the signature is
> the same serialized proto that was signed. That way, you can check
> the signature on the serialized proto without having to reserialize
> it.

There's some work on this sort of thing in IPFS [1,2,3,4], but it
still needs some fleshing out (and probably some refactoring to make a
sign/validate library that's independent of IPNS's record
implementation). The main problem with this approach (regardless of
whether you're using the IPFS code or not) is that while it works for
validating serialized messages, it doesn't work so well for validating
the filesystem after you've unpacked it to the disk. With IPFS, you
get around that by retaining access to the original serialized
messages (just re-fetch that hash if you don't have it in your local
store anymore, assuming that *someone* still has the message on your
IPFS network). So verify is:

1. Check the signed digest vs. the hash of the *original* message.
2. Unpack the original message to get its payload.
3. Compare that payload with what you have on-disk.

With a system that did not include message caching,
you'd need stricter rules about how to regenerate the serialized
message based on the filesystem representation:

1. Re-serialize the filesystem information using your explicit, not
implementation-dependent protocol.
2. Check the signed digest vs. the hash of the *re-generated* message.

In my earlier IPFS-verification example [5], I was doing that latter:

user$ ipfs object get /ipfs/QmdHkJ5wC9MZC2orMYhoXSDwK7kYFXSoASC5kWFrWHt7ko/rootfs/bin | jq --raw-output '.Links[]|select(.Name == "busybox").Hash'
QmfBYD5Yxin1uQUdStMq2h3YttrzV3aD2qUjbXg25jezi3
user$ ipfs add --quiet --only-hash rootfs/bin/busybox
QmfBYD5Yxin1uQUdStMq2h3YttrzV3aD2qUjbXg25jezi3

But that relys on a stable ‘ipfs add …’. The former approach (which
just relys on compatible deserialization) would be:

user$ ipfs get -o /tmp/busybox /ipfs/QmdHkJ5wC9MZC2orMYhoXSDwK7kYFXSoASC5kWFrWHt7ko/rootfs/bin/busybox
Saving file(s) to /tmp/busybox
1.86 MB 0
user$ sha256sum /tmp/busybox
32eddf76dc4140d5e7334d22475ea2634fde3230006a397cb93cd90b175e32da /tmp/busybox
user$ sha1sum rootfs/bin/busybox
32eddf76dc4140d5e7334d22475ea2634fde3230006a397cb93cd90b175e32da rootfs/bin/busybox

It would be nice if ‘ipfs get’ supported piping file content to
stdout, and IPFS doesn't currently do a great job of serializing much
metadata (uid, gid, permissions, mode, …). But the bones are good,
and adding those features to IPFS seems like a much easier lift than
building a new system from scratch to do the same things ;). And
using IPFS's existing filesystem (de)serializers for verification
doesn't mean you need to also use IPFS's distributed hash table
approach to move those signed, serialized messages around.

Of course, you could use the same “keep the original message” approach
with tar files too, but people get grumpier when you ask them to store
a full tarball for every bundle just for verification. It's an easier
sell if the messages they're storing are small and shared between
similar bundles, which is how they're used in IPFS.

Cheers,
Trevor

[1]: https://github.com/ipfs/specs/tree/master/records
[2]: https://github.com/ipfs/specs/tree/master/keychain
[3]: https://github.com/ipfs/go-ipfs/blob/v0.3.7/namesys/publisher.go#L82-L96
[4]: https://github.com/ipfs/go-ipfs/blob/v0.3.7/namesys/publisher.go#L112-L134
[5]: https://groups.google.com/a/opencontainers.org/d/msg/dev/OqnUp4jOacs/h_sYNGFiFQAJ
Subject: Re: distributable and decentralized values
Message-ID: <20150829110...@odin.tremily.us>
signature.asc

Brandon Philips

unread,
Oct 15, 2015, 1:37:04 AM10/15/15
to W. Trevor King, Solomon Hykes, dev
On Thu, Oct 15, 2015 at 12:21 AM W. Trevor King <wk...@tremily.us> wrote:
On Wed, Oct 14, 2015 at 07:44:39PM -0700, Solomon Hykes wrote:
> On Wed, Oct 14, 2015 at 7:40 PM, Brandon Philips wrote:
> > On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King wrote:
> >> On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> >> > File system Serialization
> >> >
> >> > Problem: How should we serialize the file system data?
> >>
> >> If we're not talking about transport methods here, I don't see a need
> >> to talk about filesystem serialization independent of digests.  Folks
> >> can (de)serialize with a tool of their choice (tar, IPFS, …) and then
> >> use the serialization-agnostic digest-checker to verify the on-disk
> >> bundle.
> >
> > What? How do you hash something without serializing it?
>
> I think Trevor is saying that different implementations (in his
> example, tar archives vs. IPFS encoding) might use different
> serialization methods, which will yield different byte strings from
> the same input bundle.

No, I was saying that there's no need to serialize the whole
filesystem at once if you're just building a digest.  For example:


Oh, I see. Yes, I agree. This was my intention from my original email with:

digest([]filedata{})

Take the digest of a list of "filedata" objects which encode the metadata and crypto hash of contents.

Brandon

W. Trevor King

unread,
Oct 15, 2015, 2:20:19 AM10/15/15
to Brandon Philips, Solomon Hykes, dev
On Thu, Oct 15, 2015 at 05:36:53AM +0000, Brandon Philips wrote:
> On Thu, Oct 15, 2015 at 12:21 AM W. Trevor King wrote:
> > No, I was saying that there's no need to serialize the whole
> > filesystem at once if you're just building a digest. For example:
>
> Oh, I see. Yes, I agree. This was my intention from my original
> email with:
>
> digest([]filedata{})
>
> Take the digest of a list of "filedata" objects which encode the
> metadata and crypto hash of contents.

That's still a single digest for which you need to serialize/hash the
whole filesystem. I'd rather see a signed *array of digests*, with
notes next to each digest explaining what it's for. For example, see
the ‘sha1sum a b c/d c/e >digest’ example in [1], although you'd need
to extend that to also represent filesystem metadata. Something like:

sha512-{content-digest} a
sha512-{serialized-metadata-digest} a meta
sha512-{content-digest} b
sha512-{serialized-metadata-digest} b meta
sha512-{serialized-metadata-digest} c meta
sha512-{content-digest} c/d
sha512-{serialized-metadata-digest} c/d meta
sha512-{content-digest} c/e
sha512-{serialized-metadata-digest} c/e meta

(or JSON/protobuf/whatever to that effect) is what would get signed.
Then you have an easy way to validate *only* c/d (or just it's
content, or just it's metadata) without re-hashing the whole tree.

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/AhpPjgdvCAAJ
Message-ID: <20151015041...@odin.tremily.us>
signature.asc

Jason Bouzane

unread,
Oct 15, 2015, 5:43:56 PM10/15/15
to Brandon Philips, dev
On Wed, Oct 14, 2015 at 8:36 PM, Brandon Philips
<brandon...@coreos.com> wrote:
> So, you are saying that the "filesystem proto" providing a digest of each
> file is sufficient in your mind?

Yes, but that's a qualified yes. It's sufficient for verification
purposes. See my longer answer below.

> One non-goal that is implied with that is not providing some chunked tree
> scheme so we can assemble a files contents from matching chunks that may be
> cached from the transport layer.

I don't think this is implied. E.g. the file digest that ends up in
the metadata could be the top hash of a Merkle tree. While we have no
particular need for the full tree on the machine for verification
purposes, the tree could be calculated and saved for use in other
layers such as the transport layer. However, I don't really think
pre-calculating the tree and storing it is especially useful. Assuming
a storage and transport system for bundles, the storage system is
capable of calculating a Merkle tree when the bundle is first added to
the storage system, and the transport layer could fetch the Merkle
tree from the storage system directly. So I don't see a need for
putting any kind of Merkle tree in the metadata itself.

Calculating the Merkle tree at the point of upload to the storage
system does mean that the Merkle tree is not cryptographically signed
and therefore one cannot verify the integrity of the stored files
against the original signature using it. However, I don't see that as
a huge drawback. Verifying the data within the storage system involves
re-reading all the data for the bundle, anyway. Having a full-file
digest to verify limits the parallelism that can be achieved, but it
doesn't change the amount of data that must be read to perform the
verification. Also, the storage system could sign the Merkle tree it
generates itself. That would likely be sufficient since the storage
system presumably trusts itself when verifying data stored inside it.

W. Trevor King

unread,
Oct 15, 2015, 6:33:49 PM10/15/15
to Jason Bouzane, Brandon Philips, dev
On Thu, Oct 15, 2015 at 02:43:54PM -0700, 'Jason Bouzane' via dev wrote:
> On Wed, Oct 14, 2015 at 8:36 PM, Brandon Philips
> > One non-goal that is implied with that is not providing some
> > chunked tree scheme so we can assemble a files contents from
> > matching chunks that may be cached from the transport layer.
>
> I don't think this is implied. E.g. the file digest that ends up in
> the metadata could be the top hash of a Merkle tree. While we have
> no particular need for the full tree on the machine for verification
> purposes, the tree could be calculated and saved for use in other
> layers such as the transport layer. However, I don't really think
> pre-calculating the tree and storing it is especially
> useful. Assuming a storage and transport system for bundles, the
> storage system is capable of calculating a Merkle tree when the
> bundle is first added to the storage system, and the transport layer
> could fetch the Merkle tree from the storage system directly. So I
> don't see a need for putting any kind of Merkle tree in the metadata
> itself.

There's a lot going on here, and I feel like I'm missing a fair bit of
it ;). Are you saying “you can have both a flat digest file for
verification (like [1]) and a separate Merkle tree for unverified
distribution.”? That makes sense to me, but I don't understand where
pre-calculating the tree comes into it. Can you unpack this paragraph
to be a bit more concrete?

> Calculating the Merkle tree at the point of upload to the storage
> system does mean that the Merkle tree is not cryptographically
> signed and therefore one cannot verify the integrity of the stored
> files against the original signature using it. However, I don't see
> that as a huge drawback. Verifying the data within the storage
> system involves re-reading all the data for the bundle, anyway.

I agree that verifying *all the data* within the storage system
involves re-reading all your Merkle objects. But it seems silly to
have a Merkle system without making those Merkle objects
content-addressable by hash. And once you have that, verifying any
object (signed or not) is just “does its hash match the hash I'm using
for its name?”. So you can have separate checks for “are all my
Merkle objects uncorrupted?” (by comparing the current hash with the
name) and “are all my Merkle objects associated with {set of signed
root hashes}?” (with mark-and-sweep or some such garbage collector
walking the storage without hashing anything). And assuming you have
path-based names in your Merkle objects associated with each hash,
walking the tree from the root to a particular object of interest is
fast, and doesn't require traversing the whole tree. Or am I missing
the point you were trying to make?

> Having a full-file digest to verify limits the parallelism that can
> be achieved, but it doesn't change the amount of data that must be
> read to perform the verification.

What is a “full-file digest”? Is that just “the Merkle objects
include the data being hashed as well as the hashes” (vs. a system
that just constructed its Merkle tree from the hashes, e.g. my
sha512sum extrapolation [1]). Presumably anyone doing the validation
will have access to the data thats being hashed (via the digest Merkle
tree or some out-of-band communication), so the “workload doesn't
depend on data inclusion in the Merkle tree” makes sense to me.
However, any system that is verifying data hashes is treating that
data as a Merkel object, so I don't see the benefit to splitting this
particular hair. Either you're verifying a Merkle tree with data in
it, or you're just passing around unverified blocks.

> Also, the storage system could sign the Merkle tree it generates
> itself. That would likely be sufficient since the storage system
> presumably trusts itself when verifying data stored inside it.

This sounds like a step in the garbage-collection check I was talking
about above (if you wanted a secure way to distribute the current
roots for garbage collectors to use). But once you're back to “all
the data is in a signed-able Merkle tree” I think you've come back
around to IPFS ;). And indeed, my sha512sum extrapolation [1] with
its single root object (the digest file) and hashes referencing all
the content is basically IPFS except without the consistent
Merkle-object packaging, pre-built tooling, or efficient tree
structure (since my sha512sum “tree” is only one layer deep).

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/7pB714d1CAAJ
Message-ID: <20151015061...@odin.tremily.us>
signature.asc

Jason Bouzane

unread,
Oct 15, 2015, 7:02:16 PM10/15/15
to W. Trevor King, Brandon Philips, dev
On Thu, Oct 15, 2015 at 3:31 PM, W. Trevor King <wk...@tremily.us> wrote:
> On Thu, Oct 15, 2015 at 02:43:54PM -0700, 'Jason Bouzane' via dev wrote:
>> On Wed, Oct 14, 2015 at 8:36 PM, Brandon Philips
>> > One non-goal that is implied with that is not providing some
>> > chunked tree scheme so we can assemble a files contents from
>> > matching chunks that may be cached from the transport layer.
>>
>> I don't think this is implied. E.g. the file digest that ends up in
>> the metadata could be the top hash of a Merkle tree. While we have
>> no particular need for the full tree on the machine for verification
>> purposes, the tree could be calculated and saved for use in other
>> layers such as the transport layer. However, I don't really think
>> pre-calculating the tree and storing it is especially
>> useful. Assuming a storage and transport system for bundles, the
>> storage system is capable of calculating a Merkle tree when the
>> bundle is first added to the storage system, and the transport layer
>> could fetch the Merkle tree from the storage system directly. So I
>> don't see a need for putting any kind of Merkle tree in the metadata
>> itself.
>
> There's a lot going on here, and I feel like I'm missing a fair bit of
> it ;). Are you saying “you can have both a flat digest file for
> verification (like [1]) and a separate Merkle tree for unverified
> distribution.”? That makes sense to me, but I don't understand where
> pre-calculating the tree comes into it. Can you unpack this paragraph
> to be a bit more concrete?

Sorry for the denseness of that. Let me try to be clearer. I'm
discussing three scenarios:

1) Signed metadata contains a Merkle top hash, but not the Merkle tree.

I.e. the proto that we sign contains a single digest that is the top
hash of the Merkle tree. The Merkle tree itself need not be stored as
long as we know how to reconstruct it. Thus, there's little point in
keeping the entire Merkle tree on the machine where we've unpacked the
files since we can (and in fact must) recompute the entire tree if we
wish to verify the cryptographic signature.

However, I was arguing that I'm not sure this is especially useful
(although I don't have any particular objection to it, either).

2) Signed metadata contains the entire Merkle tree.

This seems wasteful to me. For especially large bundles, this can make
the metadata unwieldy.

3) Signed metadata contains regular whole-file hashes.

This is what I was discussing below when I referred to a "full-file"
digest. I simply mean a straight hash of the file, not chunked in any
way nor using hash trees. E.g. a simple SHA-256 hash of the files.

The main drawback to this approach is that if the storage system does
compute a Merkle tree and chunk the data, then you have to re-read all
the data in order to verify the whole-file digest of the files, which
is necessary if you want to check the cryptographic integrity of the
original signature. That approach is necessarily slower than
recomputing the digests of the chunks since the chunks can all have
their digests calculated in parallel.

The benefit to this approach over (1) is simplicity. The spec doesn't
need to describe how to chunk the data to recompute the Merkle tree
for verification of the cryptographic signature since that's an
implementation detail of the storage system. Signature verification
just requires straight hashing of the file's contents from beginning
to end. If the storage system wishes to do more advanced optimization
(such as the IPFS), this isn't precluded. What is precluded is
efficient verification of the original signature of the data in the
storage system. That can be mitigated by having the storage system
apply its own signature to the Merkle tree and having the verification
of the data inside the storage system trust the storage system's
certificate.

>> Calculating the Merkle tree at the point of upload to the storage
>> system does mean that the Merkle tree is not cryptographically
>> signed and therefore one cannot verify the integrity of the stored
>> files against the original signature using it. However, I don't see
>> that as a huge drawback. Verifying the data within the storage
>> system involves re-reading all the data for the bundle, anyway.
>
> I agree that verifying *all the data* within the storage system
> involves re-reading all your Merkle objects. But it seems silly to
> have a Merkle system without making those Merkle objects
> content-addressable by hash. And once you have that, verifying any
> object (signed or not) is just “does its hash match the hash I'm using
> for its name?”. So you can have separate checks for “are all my
> Merkle objects uncorrupted?” (by comparing the current hash with the
> name) and “are all my Merkle objects associated with {set of signed
> root hashes}?” (with mark-and-sweep or some such garbage collector
> walking the storage without hashing anything). And assuming you have
> path-based names in your Merkle objects associated with each hash,
> walking the tree from the root to a particular object of interest is
> fast, and doesn't require traversing the whole tree. Or am I missing
> the point you were trying to make?

The point I was making is that if the signed metadata contains a
regular hash of the file (i.e. not the top hash of a Merkle tree),
then the verification of the data is less efficient than it would be
if the storage system were using a chunking scheme with a Merkle tree
that had a signed top hash. In the scenario I was discussing, the only
Merkle tree is the one created by the storage system at the point the
bundle is uploaded to it. The generated tree would not be part of the
signed metadata. The storage system could sign it itself using a
certificate assigned to the storage system, of course, but the
original signature on the metadata would not apply to the Merkle tree.

>> Having a full-file digest to verify limits the parallelism that can
>> be achieved, but it doesn't change the amount of data that must be
>> read to perform the verification.
>
> What is a “full-file digest”?

I just mean a hash over the entire file without chunking.

> Is that just “the Merkle objects
> include the data being hashed as well as the hashes” (vs. a system
> that just constructed its Merkle tree from the hashes, e.g. my
> sha512sum extrapolation [1]). Presumably anyone doing the validation
> will have access to the data thats being hashed (via the digest Merkle
> tree or some out-of-band communication), so the “workload doesn't
> depend on data inclusion in the Merkle tree” makes sense to me.
> However, any system that is verifying data hashes is treating that
> data as a Merkel object, so I don't see the benefit to splitting this
> particular hair. Either you're verifying a Merkle tree with data in
> it, or you're just passing around unverified blocks.

You aren't necessarily passing around unverified blocks. Assuming
scenario (3) above, the transport layer could be aware of the storage
layer, and it could verify a signature of the data that's applied by
the storage system. Also, the data can be checked against the metadata
when it's fully downloaded by comparing the signed digest of the file
against the file itself.

>> Also, the storage system could sign the Merkle tree it generates
>> itself. That would likely be sufficient since the storage system
>> presumably trusts itself when verifying data stored inside it.
>
> This sounds like a step in the garbage-collection check I was talking
> about above (if you wanted a secure way to distribute the current
> roots for garbage collectors to use). But once you're back to “all
> the data is in a signed-able Merkle tree” I think you've come back
> around to IPFS ;). And indeed, my sha512sum extrapolation [1] with
> its single root object (the digest file) and hashes referencing all
> the content is basically IPFS except without the consistent
> Merkle-object packaging, pre-built tooling, or efficient tree
> structure (since my sha512sum “tree” is only one layer deep).

The question I'm getting at is, do we want to prescribe Merkle trees
in the spec by making the metadata contain a Merkle top hash (or a
similar hash tree construct), or do we say that chunking and hash
trees are an implementation detail of the storage and transport layer
while keeping the verification logic simple and free of hash trees by
only including straight, unchunked digests of files?

W. Trevor King

unread,
Oct 15, 2015, 7:54:45 PM10/15/15
to Jason Bouzane, Brandon Philips, dev
On Thu, Oct 15, 2015 at 04:02:15PM -0700, Jason Bouzane wrote:
> I'm discussing three scenarios:
>
> 1) Signed metadata contains a Merkle top hash, but not the Merkle tree.
>
> I.e. the proto that we sign contains a single digest that is the top
> hash of the Merkle tree. The Merkle tree itself need not be stored
> as long as we know how to reconstruct it. Thus, there's little point
> in keeping the entire Merkle tree on the machine where we've
> unpacked the files since we can (and in fact must) recompute the
> entire tree if we wish to verify the cryptographic signature.
>
> However, I was arguing that I'm not sure this is especially useful
> (although I don't have any particular objection to it, either).
>
> 2) Signed metadata contains the entire Merkle tree.
>
> This seems wasteful to me. For especially large bundles, this can make
> the metadata unwieldy.

What is (2)? Are you signing more than the root hash? Or just
distributing the involved Merkle objects with the bundle? The later
seems easily optional, and the former seems to miss the point of
Merkle trees ;).

> 3) Signed metadata contains regular whole-file hashes.
>
> This is what I was discussing below when I referred to a "full-file"
> digest. I simply mean a straight hash of the file, not chunked in
> any way nor using hash trees. E.g. a simple SHA-256 hash of the
> files.

This is still a Merkle tree. It's one layer deep (hash file → hashed
leaf). The difference between (1) and (3) is just whether you're
embedding the root object or its hash in the file you're signing.

The main traps we want to avoid seem to be:

a. Collecting too much stuff in a single Merkle object.
E.g. digest([]filedata{}) or, even worse, digest([]filecontent)
[1]. That makes re-hashing that object expensive, and likely to
involve hashing things that you don't actually care about checking.

b. Breaking the accessibility chain so you can't walk from the Merkle
root toward the object you're interested. For bundle-filesystems,
that likely means “path-based names associated with the hashes”.

c. Not being able to compare the (indirectly) signed hash with your
local filesytem. That means both:

i. You can't fetch the original message from content-addressable
storage using its hash, and

ii. You can't reliably regenerate the object byte-for-byte using
your on-disk filesystem.

> That approach is necessarily slower than recomputing the digests of
> the chunks since the chunks can all have their digests calculated in
> parallel.

If the storage system is using the same hashing algorithm, you can
just run your corruption auditor constantly in the background.
Depending on how fresh you need that corruption check to be, that
might let you get some of the verification out of the way before
anyone even requests the metadata validation.

> The benefit to this approach over (1) is simplicity. The spec
> doesn't need to describe how to chunk the data to recompute the
> Merkle tree for verification of the cryptographic signature since
> that's an implementation detail of the storage system. Signature
> verification just requires straight hashing of the file's contents
> from beginning to end.

“Unchunked file-content hashing” isn't fundamentally any different
from “chunked file hashing with a tree to organize the chunks”. The
only differences would be scaling efficiency (better for chunks, since
they give you more flexibility) and code complexity (simpler without
chunking, but Merkle chunkers are now off-the-shelf tech, so not a big
difference).

> If the storage system wishes to do more advanced optimization (such
> as the IPFS), this isn't precluded. What is precluded is efficient
> verification of the original signature of the data in the storage
> system. That can be mitigated by having the storage system apply its
> own signature to the Merkle tree and having the verification of the
> data inside the storage system trust the storage system's
> certificate.

I'm still missing something in here. Is this “Alice is currently
uploading metadata signed by Bob and its associated Merkle objects to
my storage server, how do I effeciently guard against her passing me
bogus data?”?

> The point I was making is that if the signed metadata contains a
> regular hash of the file (i.e. not the top hash of a Merkle tree),
> then the verification of the data is less efficient than it would be
> if the storage system were using a chunking scheme with a Merkle
> tree that had a signed top hash.

So that's “if the storage system wants to verify the metadata
signature but happens to chunk it's Merkle tree internally using a
different scheme, it's going to end up doing twice the work”?

> The question I'm getting at is, do we want to prescribe Merkle trees
> in the spec by making the metadata contain a Merkle top hash (or a
> similar hash tree construct), or do we say that chunking and hash
> trees are an implementation detail of the storage and transport
> layer while keeping the verification logic simple and free of hash
> trees by only including straight, unchunked digests of files?

I think that's just a distinction of scale, not of type (“straight,
unchunked digests of files” is just a simple Merkle tree
specification, which will get more complicated as it grows to include
file metadata). And judging the best balance point between “we don't
want it to be complicated enough to distracting from the runtime
functionality” and “we want it to be complicated enough to be
performant” seems tricky in the absence of working prototypes ;). I
think we can get the best of both worlds by saying “distribution and
verification is handled in a higher layer” in opencontainers/specs and
then spinning up separate projects that *just* focus on this aspect of
things.
signature.asc

Vincent Batts

unread,
Oct 16, 2015, 1:46:18 PM10/16/15
to Jason Bouzane, Brandon Philips, dev
Do I read this as, order and other variations ought to be accounted
for in the struct themselves?

Vincent Batts

unread,
Oct 16, 2015, 2:44:48 PM10/16/15
to Jason Bouzane, W. Trevor King, Brandon Philips, dev
On Thu, Oct 15, 2015 at 7:02 PM, 'Jason Bouzane' via dev
<d...@opencontainers.org> wrote:
> Sorry for the denseness of that. Let me try to be clearer. I'm
> discussing three scenarios:
>
> 1) Signed metadata contains a Merkle top hash, but not the Merkle tree.
>
> I.e. the proto that we sign contains a single digest that is the top
> hash of the Merkle tree. The Merkle tree itself need not be stored as
> long as we know how to reconstruct it. Thus, there's little point in
> keeping the entire Merkle tree on the machine where we've unpacked the
> files since we can (and in fact must) recompute the entire tree if we
> wish to verify the cryptographic signature.
>
> However, I was arguing that I'm not sure this is especially useful
> (although I don't have any particular objection to it, either).

This kind of hint seems non-intrusive enough.
Though it seems this would be most efficient for publishing
collections of bundles, that could be picked and chosen. A bundle is a
concise unit, so the signature on its components, should be a
grouping.
Honestly, this is not dissimilar from bittorrent cataloging a
filesystem, but would include additional filesystem information and
attributes to set and verify against.

[snip]

>>> Also, the storage system could sign the Merkle tree it generates
>>> itself. That would likely be sufficient since the storage system
>>> presumably trusts itself when verifying data stored inside it.
>>
>> This sounds like a step in the garbage-collection check I was talking
>> about above (if you wanted a secure way to distribute the current
>> roots for garbage collectors to use). But once you're back to “all
>> the data is in a signed-able Merkle tree” I think you've come back
>> around to IPFS ;). And indeed, my sha512sum extrapolation [1] with
>> its single root object (the digest file) and hashes referencing all
>> the content is basically IPFS except without the consistent
>> Merkle-object packaging, pre-built tooling, or efficient tree
>> structure (since my sha512sum “tree” is only one layer deep).
>
> The question I'm getting at is, do we want to prescribe Merkle trees
> in the spec by making the metadata contain a Merkle top hash (or a
> similar hash tree construct), or do we say that chunking and hash
> trees are an implementation detail of the storage and transport layer
> while keeping the verification logic simple and free of hash trees by
> only including straight, unchunked digests of files?

I'm of the mind to have checksums not be embedded in the format. There
are too many moving parts (i.e. merkle but using which hash? which
block size? future proofed against stronger hashes?).

Both of these could/would require a higher level service to do that
use that functionality, or its in inverse. i.e. given a straight,
unchunked [sha256] digest, provide the top-hash or whole tree. OR
given a top-hash, provide the sha256, sha512, etc of the fileset.

The nuanced question is really, is the requirement to provide crypto
assurance on "the bundle as a unit", or "each individual file of a
bundle"?

For example, if we determine that the choice is actually "the bundle
as a unit", the we could have a higher level service provide the
catalog of file level index and digests, or merkle top or tree.
But if it crypto assurance of "each individual file of a bundle", then
that seems to imply the digests (of whatever nature) are prescribed in
the metadata.

Does that sound incorrect?

vb

W. Trevor King

unread,
Oct 16, 2015, 5:46:04 PM10/16/15
to Vincent Batts, Jason Bouzane, Brandon Philips, dev
On Fri, Oct 16, 2015 at 02:44:46PM -0400, Vincent Batts wrote:
> I'm of the mind to have checksums not be embedded in the
> format. There are too many moving parts (i.e. merkle but using which
> hash? which block size? future proofed against stronger hashes?).

There's no way to actually *avoid* embedding a hash of some sort in
the signed manifest. You could future-proof easily by adding
additional data to that manifest referencing the hashing system and
version used to generate it.

> The nuanced question is really, is the requirement to provide crypto
> assurance on "the bundle as a unit", or "each individual file of a
> bundle"?
>
> For example, if we determine that the choice is actually "the bundle
> as a unit", the we could have a higher level service provide the
> catalog of file level index and digests, or merkle top or tree. But
> if it crypto assurance of "each individual file of a bundle", then
> that seems to imply the digests (of whatever nature) are prescribed
> in the metadata.

Anything that verifies the bundle as a unit (as it stands on the
fileystem) will necessarily be able to verify each individual file in
the bundle. Are you suggesting requiring a particular opaque tool for
verficiation? For example:

$ tar -c . | curl -XPOST -d - https://opencontainers.org/hash
399fffe3568b443151d649fd8d900746e8bff9b3

I doubt we'll win many friends by basing verification on passing all
data unencrypted into an opaque verifier ;).

Or are you suggesting that there's a good technical reason for
entering trap (b) (breaking the accessibility chain [4])?

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/uYTtORKvCAAJ
Message-ID: <20151015235...@odin.tremily.us>
signature.asc

Stephen Day

unread,
Oct 19, 2015, 2:01:44 PM10/19/15
to W. Trevor King, Vincent Batts, Jason Bouzane, Brandon Philips, dev
Brandon:

You've done a fantastic job of enumerating the goals here. I mostly agree with the points. We should make it more clear that this shouldn't necessarily be deeply integrated with the transport system, although hashing may be supported by it.

It should also be made clear that we do like a flat-file, and the merkel discussions are around optimizing a single-file hash. Is there a paper or RFC that covers the definition of a merkel hash to be used in continuity? As it stands now, we just have a simple sha256. It is standardized and very easy to implement. It is unlikely to have differences between language implementations. From a bandwidth perspective, sha256 roughly gets around 200 MB/s and sha512 gets around 325 MB/s of CPU bandwidth.

The reasoning for using merkel seems to be an optimization for hash calculation. While modern flash disks can have around 400-800 MB/s, this bandwidth has to be proportioned over multiple files. The other consideration is that merkel could reduce the total data read on disk during verification. Unfortunately, this will only be faster for the rejection case, which would be much less common than a full tree traversal.

My worry with merkel is that we adopt an algorithm that is poorly specified based on an optimization without choosing a stable and simple algorithm. I would favor extending the format to make block-level hashes a part of the format, rather than adopting something without a common, well-known implementation. We can make a small adjustment to the format to make this a possibility. Without having real data showing that merkel hashes will have broad performance improvement across systems and CPU architectures, I'd be against adopting the more complex approach.

Stephen.


To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.

Vincent Batts

unread,
Oct 19, 2015, 3:11:31 PM10/19/15
to Stephen Day, W. Trevor King, Jason Bouzane, Brandon Philips, dev
Doesn't have to be too complex, really. hash and block size are key.

I had made the "default" on the merkle implementation I wrote sha1,
because its mostly "good enough" and the same as what bit torrent
uses.
That is easy enough to provide different versions. Unit benchmarking
shows that sha512 is the next runner up.

vbatts@valse ~/src/vb/merkle (master) $ go test -run=NONE -bench=.
PASS
BenchmarkHash8Bytes-4 500000 3267 ns/op
2.45 MB/s
BenchmarkHash1K-4 200000 6333 ns/op
161.69 MB/s
BenchmarkHash8K-4 100000 20113 ns/op
407.28 MB/s
BenchmarkSha256Hash8Bytes-4 300000 3530 ns/op
2.27 MB/s
BenchmarkSha256Hash1K-4 200000 10249 ns/op
99.91 MB/s
BenchmarkSha256Hash8K-4 30000 49683 ns/op
164.88 MB/s
BenchmarkSha512Hash8Bytes-4 500000 3956 ns/op
2.02 MB/s
BenchmarkSha512Hash1K-4 200000 7886 ns/op
129.84 MB/s
BenchmarkSha512Hash8K-4 50000 32271 ns/op
253.84 MB/s
ok _/home/vbatts/src/vb/merkle 16.107s

vb

Stephen Day

unread,
Oct 19, 2015, 5:12:22 PM10/19/15
to Vincent Batts, W. Trevor King, Jason Bouzane, Brandon Philips, dev
Here are some benchmark results that just do file chunking:

PASS
BenchmarkHash8KB-8           30000     49649 ns/op 165.00 MB/s
BenchmarkHash1MB-8            1000   2313870 ns/op 453.17 MB/s
BenchmarkHash4MB-8             200   8752671 ns/op 479.20 MB/s
BenchmarkHash8MB-8             100  17781666 ns/op 471.76 MB/s
BenchmarkHashSHA5128Bytes-8 3000000       528 ns/op  15.12 MB/s
BenchmarkHashSHA5121K-8      500000      3609 ns/op 283.69 MB/s
BenchmarkHashSHA5128K-8       50000     24818 ns/op 330.08 MB/s
BenchmarkHashSHA5121MB-8       500   3124441 ns/op 335.60 MB/s

This is 4KB blocks, hashed in goroutines, with a single reader thread. The resulting hash is just a concatenation of the block hashes. Code is available upon request (my methods are slightly different, as can be seen by ns/op discrepancy). This could even be modified to get the same results as a straight sha256/sha512 using stevvooe/resumable.

My point here is that adjusting techniques for hashing is very sensitive to hardware and file system layout. Any bandwidth for hashing needs to be shared with other files. We need a real world test case that shows the extra complexity provides benefit. We also need to consider that optimizations that work now may not be valid in the future.

My intuition here is that algorithm selection will be very dependent on the distribution of file sizes in a bundle root. For example, bundle root that skews towards large files would benefit from a merkle hash, whereas a bundle root with lots of small files would benefit from straight hashing.

What is really needed is algorithmic agility, per manifest resource entry.

Jason Bouzane

unread,
Oct 20, 2015, 1:00:13 PM10/20/15
to W. Trevor King, Brandon Philips, dev
On Thu, Oct 15, 2015 at 4:52 PM, W. Trevor King <wk...@tremily.us> wrote:
> On Thu, Oct 15, 2015 at 04:02:15PM -0700, Jason Bouzane wrote:
>> This seems wasteful to me. For especially large bundles, this can make
>> the metadata unwieldy.
>
> What is (2)? Are you signing more than the root hash? Or just
> distributing the involved Merkle objects with the bundle? The later
> seems easily optional, and the former seems to miss the point of
> Merkle trees ;).

I was discussing the case where you embed the internal nodes of the
Merkle tree into the proto that is hashed and signed. I didn't spend
much time on it because I don't think it's a useful representation of
the data.

>
>> 3) Signed metadata contains regular whole-file hashes.
>>
>> This is what I was discussing below when I referred to a "full-file"
>> digest. I simply mean a straight hash of the file, not chunked in
>> any way nor using hash trees. E.g. a simple SHA-256 hash of the
>> files.
>
> This is still a Merkle tree. It's one layer deep (hash file → hashed
> leaf). The difference between (1) and (3) is just whether you're
> embedding the root object or its hash in the file you're signing.

It's not a Merkle tree except in the sense that you do end up hashing
something that contains hashes. But the hashes of the individual files
would not be the only thing hashed to get the metadata hash that was
then signed. The metadata hash (the single signed hash) would be over
the entire proto contents that contain all file data, including
filenames, file sizes, modes, mtimes, etc.

>
> The main traps we want to avoid seem to be:
>
> a. Collecting too much stuff in a single Merkle object.
> E.g. digest([]filedata{}) or, even worse, digest([]filecontent)
> [1]. That makes re-hashing that object expensive, and likely to
> involve hashing things that you don't actually care about checking.

Rehashing file metadata wouldn't be expensive.

> b. Breaking the accessibility chain so you can't walk from the Merkle
> root toward the object you're interested. For bundle-filesystems,
> that likely means “path-based names associated with the hashes”.

Agreed, in the sense that being able to verify individual files
without having to re-verify the entire tree is useful.

> c. Not being able to compare the (indirectly) signed hash with your
> local filesytem. That means both:
>
> i. You can't fetch the original message from content-addressable
> storage using its hash, and

Being able to fetch based on its hash is necessary but not sufficient.
We also need human-readable names and the ability to fetch objects
using those. It's useful to be able to say, "here's the name of the
version of this bundle that is supposed to be on disk. Please retrieve
the signed metadata for this version of the bundle and verify it."

> ii. You can't reliably regenerate the object byte-for-byte using
> your on-disk filesystem.

That's not really a trap. It's quite a bit of trouble to define a
serialization format that has one true representation that can be
regenerated the same way every time, byte for byte. Instead, I've
argued that it's cleaner to store the signed serialized data alongside
the signature data and then deserialize it whenever it is needed.

> “Unchunked file-content hashing” isn't fundamentally any different
> from “chunked file hashing with a tree to organize the chunks”. The
> only differences would be scaling efficiency (better for chunks, since
> they give you more flexibility) and code complexity (simpler without
> chunking, but Merkle chunkers are now off-the-shelf tech, so not a big
> difference).

It's not that simple. Merkle hashes require quite a bit of extra
information, especially with more advanced uses. You must at the very
least define the block size and the tree depth. You can get into more
advanced encodings where the depth of the three varies with the size
of the data, or variable length blocks that are defined using the
Rabin-Karp algorithm to determine block boundaries to improve the
likelihood of finding identical blocks.

>> If the storage system wishes to do more advanced optimization (such
>> as the IPFS), this isn't precluded. What is precluded is efficient
>> verification of the original signature of the data in the storage
>> system. That can be mitigated by having the storage system apply its
>> own signature to the Merkle tree and having the verification of the
>> data inside the storage system trust the storage system's
>> certificate.
>
> I'm still missing something in here. Is this “Alice is currently
> uploading metadata signed by Bob and its associated Merkle objects to
> my storage server, how do I effeciently guard against her passing me
> bogus data?”?

That would be one situation where not having a Merkle tree signed with
the original hash would be trouble, yes. But I was assuming that Alice
would have to provide the signed metadata to the storage system at the
time of the upload, which would allow the storage system to detect any
corruption or malicious changes. If the data are uploaded in-order,
then verification at upload time is cheap.

>> The point I was making is that if the signed metadata contains a
>> regular hash of the file (i.e. not the top hash of a Merkle tree),
>> then the verification of the data is less efficient than it would be
>> if the storage system were using a chunking scheme with a Merkle
>> tree that had a signed top hash.
>
> So that's “if the storage system wants to verify the metadata
> signature but happens to chunk it's Merkle tree internally using a
> different scheme, it's going to end up doing twice the work”?

Yes.

W. Trevor King

unread,
Oct 20, 2015, 4:04:52 PM10/20/15
to Jason Bouzane, Brandon Philips, dev
On Tue, Oct 20, 2015 at 10:00:11AM -0700, Jason Bouzane wrote:
> On Thu, Oct 15, 2015 at 4:52 PM, W. Trevor King wrote:
> > On Thu, Oct 15, 2015 at 04:02:15PM -0700, Jason Bouzane wrote:
> >> 3) Signed metadata contains regular whole-file hashes.
> >>
> >> This is what I was discussing below when I referred to a
> >> "full-file" digest. I simply mean a straight hash of the file,
> >> not chunked in any way nor using hash trees. E.g. a simple
> >> SHA-256 hash of the files.
> >
> > This is still a Merkle tree. It's one layer deep (hash file →
> > hashed leaf). The difference between (1) and (3) is just whether
> > you're embedding the root object or its hash in the file you're
> > signing.
>
> It's not a Merkle tree except in the sense that you do end up
> hashing something that contains hashes. But the hashes of the
> individual files would not be the only thing hashed to get the
> metadata hash that was then signed. The metadata hash (the single
> signed hash) would be over the entire proto contents that contain
> all file data, including filenames, file sizes, modes, mtimes, etc.

That's still a Merkle tree, no? Signing tools like GnuPG will be
hashing the contents that they sign. If you want to pack the whole
filesystem up in a single tarball or protobuf, you'll have a two-node
tree:

* signature (hash of serialized bundle, signing key reference,
algorithm, …)
* serialized bundle (tarball / protobuf / …)

If you instead want to hash the files, you get a three-layer tree:

* signature (hash of manifest, signing key reference, algorithm, …)
* manifest (hashes of each file's content, hash of serialized
metadata)
* serialized metadata (protobuf / … for all the files)
* file 1 content
* file 2 content


If you want to hash metadata separately for each file:

* signature (hash of manifest, signing key reference, algorithm, …)
* manifest (hashes of each file's content and metadata)
* serialized file 1 metadata (protobuf / …)
* file 1 content
* serialized directory 1 metadata (protobuf / …)
* serialized file 2 metadata (protobuf / …)
* file 2 content


If you want to reflect the directory structure in your hash tree:

* signature (hash of manifest, signing key reference, algorithm, …)
* manifest (hash of the root directory)
* directory 1 (serialized metadata + child hashes)
* file 1 (seralized metadata + content hash)
* file 1 content
* directory 2 (serialized metada + child hashes)
* file 2 (seralized metadata + content hash)
* file 2 content


If you want to block your files:

* signature (hash of manifest, signing key reference, algorithm, …)
* manifest (hash of the root directory)
* directory 1 (serialized metadata + child hashes)
* file 1 (seralized metadata + chile hashes)
* block-set 1
* file 1, block 1
* file 1, block 2
* file 1, block 3
* block-set 2
* file 1, block 4
* file 1, block 5
* file 1, block 6


etc., etc. Each approach fits into a different place in the
efficiency vs. complication balance, but they're all Merkle trees.

> > The main traps we want to avoid seem to be:
> >
> > a. Collecting too much stuff in a single Merkle object.
> > E.g. digest([]filedata{}) or, even worse, digest([]filecontent)
> > [1]. That makes re-hashing that object expensive, and likely
> > to involve hashing things that you don't actually care about
> > checking.
>
> Rehashing file metadata wouldn't be expensive.

Walking a whole bundle collecting metadata is still a wasteful number
of filesystem hits if you're just interested in verifying a subset of
the filesystem. Git packs loose objects into packfiles to avoid
excessive filesystem hits, and using a Merkle approach that avoids
collecting content from multiple files in the same hash seems like it
would also be a win for distribution.

> > c. Not being able to compare the (indirectly) signed hash with
> > your local filesytem. That means both:
> >
> > i. You can't fetch the original message from
> > content-addressable storage using its hash, and
>
> Being able to fetch based on its hash is necessary but not
> sufficient.

It's not even necessary if you can regenerate the object (avoiding my
c.ii).

> We also need human-readable names and the ability to fetch objects
> using those. It's useful to be able to say, "here's the name of the
> version of this bundle that is supposed to be on disk. Please
> retrieve the signed metadata for this version of the bundle and
> verify it."

Associating a name with a hashed bundle is a higher level. I'm just
talking about what's needed for hashing / verification here. Once you
have that, you just need to insert the name in the root object that
gets signed. So signature("{name}" + "\0" + "{serialized-bundle}") or
signature(manifest) where manifest has both the name and a set of
child hashes.

And *retrieving* those signatures is an even higher level (I'd just
cache those locally when I first fetched the bundle [1]).

> > ii. You can't reliably regenerate the object byte-for-byte
> > using your on-disk filesystem.
>
> That's not really a trap. It's quite a bit of trouble to define a
> serialization format that has one true representation that can be
> regenerated the same way every time, byte for byte. Instead, I've
> argued that it's cleaner to store the signed serialized data
> alongside the signature data and then deserialize it whenever it is
> needed.

I agree (that's how IPFS does it, and not having it is my trap c.i).
I'm just saying that if you avoid trap c.ii, you can fall into c.i and
still be ok (and vice versa). In an ideal world you'd avoid both c.i
and c.ii, but as you point out, avoiding c.ii is hard.

> > “Unchunked file-content hashing” isn't fundamentally any different
> > from “chunked file hashing with a tree to organize the chunks”.
> > The only differences would be scaling efficiency (better for
> > chunks, since they give you more flexibility) and code complexity
> > (simpler without chunking, but Merkle chunkers are now
> > off-the-shelf tech, so not a big difference).
>
> It's not that simple. Merkle hashes require quite a bit of extra
> information, especially with more advanced uses. You must at the
> very least define the block size and the tree depth. You can get
> into more advanced encodings where the depth of the three varies
> with the size of the data, or variable length blocks that are
> defined using the Rabin-Karp algorithm to determine block boundaries
> to improve the likelihood of finding identical blocks.

From a user perspective, it is absolutely that simple (especially if
you're keeping the original objects around instead of regenerating
them on the fly). You stop caring after the tree has been hashed,
assuming the hashing is sufficiently strong and performant, and that
you can transmit the resulting object(s) with enough efficiency. All
the complications you list are implementation details that only need
to be addressed by the folks writing the tools to build the original
objects (and to a lesser extent by folks writing the tools to decode
the objects). For example, IPFS provides a unixfs framework where
producers are free to innovate on the blocking and fanout aspects, and
anyone with an IPFS client can verify the resulting Merkle tree and
unpack it to their disk.

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/OqnUp4jOacs/XVqfQ0hvFgAJ
Subject: Re: distributable and decentralized values
Message-ID: <20150901211...@odin.tremily.us>
signature.asc

Jason Bouzane

unread,
Oct 20, 2015, 7:30:04 PM10/20/15
to W. Trevor King, Brandon Philips, dev
On Tue, Oct 20, 2015 at 1:02 PM, W. Trevor King <wk...@tremily.us> wrote:
> etc., etc. Each approach fits into a different place in the
> efficiency vs. complication balance, but they're all Merkle trees.

To me, a Merkle tree is a tree of hashes rather than a hash of a
serialized protobuf that contains file metadata, one field of which
happens to be a hash. However, as long as we are talking about the
same thing, I'm not interested in the nomenclature or taxonomy.

> On Tue, Oct 20, 2015 at 10:00:11AM -0700, Jason Bouzane wrote:
>>
>> Rehashing file metadata wouldn't be expensive.
>
> Walking a whole bundle collecting metadata is still a wasteful number
> of filesystem hits if you're just interested in verifying a subset of
> the filesystem. Git packs loose objects into packfiles to avoid
> excessive filesystem hits, and using a Merkle approach that avoids
> collecting content from multiple files in the same hash seems like it
> would also be a win for distribution.

I'm not sure how this is relevant. If you have the serialized
metadata, you can verify any subset of the filesystem you want by
verifying the hash of the serialized metadata, checking the signature
on the hash, and then deserializing it and verifying the metadata for
the subset of files you are interested in. You never need to check the
filesystem for objects you aren't interested in verifying.

> It's not even necessary if you can regenerate the object (avoiding my
> c.ii).

Avoiding c.ii is difficult, and I don't think it's worth the effort in
this spec. You seem to agree with this below.

> From a user perspective, it is absolutely that simple (especially if
> you're keeping the original objects around instead of regenerating
> them on the fly). You stop caring after the tree has been hashed,
> assuming the hashing is sufficiently strong and performant, and that
> you can transmit the resulting object(s) with enough efficiency. All
> the complications you list are implementation details that only need
> to be addressed by the folks writing the tools to build the original
> objects (and to a lesser extent by folks writing the tools to decode
> the objects). For example, IPFS provides a unixfs framework where
> producers are free to innovate on the blocking and fanout aspects, and
> anyone with an IPFS client can verify the resulting Merkle tree and
> unpack it to their disk.

My point is that if these are left as implementation details of the
storage system (the storage system calculates any Merkle trees in
whatever way it wants at the time something is added to the storage
system), then the OCI spec need not deal with these. That allows for
more flexibility in the design of the storage system (insofar as the
storage system would wish to reuse the same Merkle tree as is defined
in the spec) and a simpler spec that doesn't need to define these
issues.

As I've argued in this thread, *not* using a Merkle hash to describe
an individual file does not significantly limit the storage system. If
you do make the descriptor of a single file a Merkle top hash, that's
only useful if the storage system can use it, and different
implementations of storage systems might reasonably make vastly
different choices about how to store data within them, including
different block sizes, different hash algorithms, static vs. dynamic
block sizes, different tree depths, and so on. If the storage system
wishes to differ on any of these points, it must calculate its own
Merkle tree, anyway. And that's assuming the storage system is even
interested in chunking and Merkle trees. Simple storage systems may
ignore the issue completely and simply store the files whole, on disk,
or perhaps in a cloud storage system.

So while I don't think that there's any particular problem with using
a Merkle top hash for the file hash, it seems like a lot of complexity
to add to the OCI container spec for very little utility, and it
doesn't enable any use cases that couldn't be enabled by leaving such
chunking schemes to the storage system itself.

W. Trevor King

unread,
Oct 21, 2015, 12:42:24 AM10/21/15
to Jason Bouzane, Brandon Philips, dev
On Tue, Oct 20, 2015 at 04:30:02PM -0700, Jason Bouzane wrote:
> On Tue, Oct 20, 2015 at 1:02 PM, W. Trevor King wrote:
> > On Tue, Oct 20, 2015 at 10:00:11AM -0700, Jason Bouzane wrote:
> >> Rehashing file metadata wouldn't be expensive.
> >
> > Walking a whole bundle collecting metadata is still a wasteful
> > number of filesystem hits if you're just interested in verifying a
> > subset of the filesystem. Git packs loose objects into packfiles
> > to avoid excessive filesystem hits, and using a Merkle approach
> > that avoids collecting content from multiple files in the same
> > hash seems like it would also be a win for distribution.
>
> I'm not sure how this is relevant. If you have the serialized
> metadata, you can verify any subset of the filesystem you want by
> verifying the hash of the serialized metadata, checking the
> signature on the hash, and then deserializing it and verifying the
> metadata for the subset of files you are interested in. You never
> need to check the filesystem for objects you aren't interested in
> verifying.

Ah, you're right. That relys on having the original serialized
metadata around to reverse the hash, but we both agree you'd want
that. The only remaining difference between per-file and all-files
metadata hashing is distribution efficiency.

> > It's not even necessary if you can regenerate the object (avoiding
> > my c.ii).
>
> Avoiding c.ii is difficult, and I don't think it's worth the effort
> in this spec. You seem to agree with this below.

Yeah, I was just listing it as an alternative if you did want to go
through the trouble (there were wishes for reproducible protobuf
earlier in the thread [1], and this is why that would be useful).

> So while I don't think that there's any particular problem with
> using a Merkle top hash for the file hash, it seems like a lot of
> complexity to add to the OCI container spec for very little utility,
> and it doesn't enable any use cases that couldn't be enabled by
> leaving such chunking schemes to the storage system itself.

Anyone doing the validation will need the local copies of the signed
objects. If you have a central storage system that uses an internal
Merkle representation, but all user interaction is via signed tarballs
(for example. This is the two-node graph in [2]), then the user has
to keep *that whole tarball* on their system (or refetch it) whenever
they want to verify a container launched from that tarred bundle. On
the other hand, if you do end-to-end-Merkle with sufficient
granularity, the objects you need to keep for local verification can
be much smaller and more reusable (5k bundles based on Debian 7.0 with
small tweaks? No problem, they're almost all the same objects).

Cheers,
Trevor

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/pR2KbYtqCAAJ
Message-ID: <CAD2oYtOH7dvvz0TrmyTU=4GEaAksQnMHXTv...@mail.gmail.com>
[2]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/svxMfm0rCgAJ
Message-ID: <20151020200...@odin.tremily.us>
signature.asc
Reply all
Reply to author
Forward
0 new messages