A bundle is designed to be moved between hosts. Although OCI doesn't define a transport method we should have a cryptographic digest of the on-disk bundle that can be used to verify that a bundle is not corrupted and in an expected configuration.
Optionally this digest could then be used in a cryptographic public-key signature system.
Now, the tricky bits are defining what parts of a file system are verified.
Potential Use Cases
Verification of a filesystem on-disk after moving bundle between hosts (everyone)
Enabling optional cryptographic public-key signature systems (everyone)
Unpacking/packing of file systems as a non-root user (Joe Beda)
Non-goals
Define the "one true digest"; it is inevitable that a container will have multiple digests calculated over it.
a filesystem or raid system may calculate a crc
a distribution method may calculate a digest with the archive format
Define a signing system; we will assume that the digest will be a short string of bytes that can be signed by different systems
Goals
Define a bundle digest that can confirm the on-disk integrity of a bundle after moving between hosts
Make this digest fast to compute and parallelizable (merkle trees)
Use cryptographic best practices for hashes based on 2015
Design for upgradeability and versioning of the digest
e.g. prefix with sha512-<digest> or similar
Potential Goals
Enable packing/unpacking as non-root users
File system Metadata to Calculate
Problem: Should we support all possible filesystem properties such as extended attributes, POSIX ACLs, NFSv4 ACLs, modification timestamps, etc?
There is a case for trying to support everything from day zero but there weren't any solid use cases to emerge for trying to support everything.
Having read through and reflected on the discussion so far I believe that we should design a digest format that may support those in the future but does not attempt to encode all of those things for now. In the digest serialization format we can then provide reasonable defaults that will error if a file has xattrs, acls, or non-default timestamps.
xattrs must be empty on-disk for now
posix acls must be empty on-disk for now
modification/access time will be ignored on-disk if empty in manifest
File system Serialization
Problem: How should we serialize the file system data?
There are a few approaches to this problem:
Serialize file system metadata and contents into a byte stream
Used in appc and docker today with tar
Approach is well understood and has existing tooling
Changing one file or one piece of metadata requires significant recalculation
Format are out of our control and difficult to extend
Serialize file system metadata and content digest into a byte stream
Prototyped in systems like https://github.com/stevvooe/continuity
Similar concept used in bittorrent
Recalculation of metadata is trivial and easy for single changes
Extensible upgrade path under our control
I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy
The basic concept is:
digest([]filedata{})
struct filedata {
name string
filetype enum
uid int
gid int
etc…
}
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.
File system Serialization
Problem: How should we serialize the file system data?
There are a few approaches to this problem:
Serialize file system metadata and contents into a byte stream
Used in appc and docker today with tar
Approach is well understood and has existing tooling
Changing one file or one piece of metadata requires significant recalculation
Format are out of our control and difficult to extend
Serialize file system metadata and content digest into a byte stream
Prototyped in systems like https://github.com/stevvooe/continuity
Similar concept used in bittorrent
Recalculation of metadata is trivial and easy for single changes
Extensible upgrade path under our control
I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy
The basic concept is:
digest([]filedata{})
struct filedata {
name string
filetype enum
uid int
gid int
etc…
}
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> Design for upgradeability and versioning of the digest
> -
>
> e.g. prefix with sha512-<digest> or similar
IPFS uses multihashes for this [1]. They currently bump into some
ambiguity because the encoding (e.g. hex, base58, …) isn't *also*
clearly represented in encoded versions of the multihash. But you
could either keep the multihash in a binary byte stream, or add an
additional prefix character when to encoding it (e.g. x11… for 11…
being hex encoded, or 8Qm… for Qm… being base58).
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> A bundle is designed to be moved between hosts. Although OCI doesn't
> define a transport method we should have a cryptographic digest of
> the on-disk bundle that can be used to verify that a bundle is not
> corrupted and in an expected configuration.
>
> Optionally this digest could then be used in a cryptographic public-key
> signature system.
I'm -1 to including this in the runtime spec, but agnostic to
including it in a higher-level, OCI managed layer [1].
On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> File system Serialization
>
> Problem: How should we serialize the file system data?
If we're not talking about transport methods here, I don't see a need
to talk about filesystem serialization independent of digests. Folks
can (de)serialize with a tool of their choice (tar, IPFS, …) and then
use the serialization-agnostic digest-checker to verify the on-disk
bundle.
Are there other reasons that we need to address filesystem
serialization in the same layer that handled digest-generation and
checking?
On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King <wk...@tremily.us> wrote:On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> File system Serialization
>
> Problem: How should we serialize the file system data?
If we're not talking about transport methods here, I don't see a need
to talk about filesystem serialization independent of digests. Folks
can (de)serialize with a tool of their choice (tar, IPFS, …) and then
use the serialization-agnostic digest-checker to verify the on-disk
bundle.What? How do you hash something without serializing it?
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:
xattrs must be empty on-disk for now
posix acls must be empty on-disk for now
modification/access time will be ignored on-disk if empty in manifest
Hmm, why would atime be included in the metadata? While it can be set, it's unlikely to stay the same for long except on readonly or noatime filesystems. Is there a use case for preserving atime?
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easyWe use the continuity approach within Google rather extensively, and it has some additional benefits, too. It allows for partial verification without having to read the entire contents of the bundle. E.g. if I wish to verify a particular file, I can hash it, check that the digest matches what's in the metadata and then recompute the metadata digest, and then verify the signature attached to it. If the signed digest is over the entire contents of the bundle instead of over the digests of the included files, then I need to read all the files on the filesystem to recompute the digest which is far more expensive if all I care about is verifying a part of that filesystem.
On Wed, Oct 14, 2015 at 10:09 AM, Brandon Philips <brandon...@coreos.com> wrote:I am leaning towards the continuity approach even though it means inventing new stuff. Having all of this in a separate file does make it rather easy
Partial verification becomes especially important if you want to create a single bundle that is multiarchitecture or multiplatform where some parts of the bundle might be omitted depending on which architecture or platform is using the bundle. I don't see multi-arch or multi-platform bundles listed in your goals or your non-goals section, but I also don't see any reason to preclude them in the future even if we don't need them now.
On Wed, Oct 14, 2015 at 07:44:39PM -0700, Solomon Hykes wrote:
> On Wed, Oct 14, 2015 at 7:40 PM, Brandon Philips wrote:
> > On Wed, Oct 14, 2015 at 2:33 PM W. Trevor King wrote:
> >> On Wed, Oct 14, 2015 at 05:09:15PM +0000, Brandon Philips wrote:
> >> > File system Serialization
> >> >
> >> > Problem: How should we serialize the file system data?
> >>
> >> If we're not talking about transport methods here, I don't see a need
> >> to talk about filesystem serialization independent of digests. Folks
> >> can (de)serialize with a tool of their choice (tar, IPFS, …) and then
> >> use the serialization-agnostic digest-checker to verify the on-disk
> >> bundle.
> >
> > What? How do you hash something without serializing it?
>
> I think Trevor is saying that different implementations (in his
> example, tar archives vs. IPFS encoding) might use different
> serialization methods, which will yield different byte strings from
> the same input bundle.
No, I was saying that there's no need to serialize the whole
filesystem at once if you're just building a digest. For example:
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@opencontainers.org.