On 2020-05-24, Till Wegmüller <
toast...@gmail.com> wrote:
> On 24.05.20 04:55, Aleksa Sarai wrote:
> > On 2020-05-23, Till Wegmüller <
toast...@gmail.com> wrote:
> >> Hello Everyone.
> >>
> >> I took the liberty to advance the topic of the OCIv2 discussion a
> >> bit and write down my own thoughts about the image spec into a
> >> Markdown Document to help build a picture of possibilities. I
> >> look forward to discuss these thoughts and add the experience of
> >> the rest of the community to either this or the final proposal.
> >>
> >> Without further ado happy reading.
> >>
> >>
https://gist.github.com/Toasterson/1dff780fe3a6339041f9b7604be3f068
> >
> >>
> > As discussed on the last call I was on, we should first agree on
> > requirements before we start discussing concrete proposals. The
> > reason is quite simple -- we need to make sure what things are a
> > priority and what usecases folks have. And sorry for not getting
> > around to this last week, I will set up a HackMD and post it on the
> > list on Monday.
>
> Yes that will help much with a colaborative drafting process and don't
> worry about rushing it in. I had this on the top of my head and wanted
> to write it down and share it.
Fair enough, I didn't mean to sound grouchy.
That was mostly just a knee-jerk reaction based on something that
happened during the distribution-spec discussions, where a few prototype
proposals (which were a massive departure from the distribution-spec we
have now) flew around and then we spent more time arguing about the
merits of the (almost identical) proposals rather than actually having a
solid and practicable draft specification to go and implement.
I'd like to avoid doing that if at all possible, hence why I made sure
to point out that my own proposal (and all the other proposals I've
seen) have gaps which we should address collectively. Among many other
things we need to discuss is what filesystem metadata should actually be
stored inside the container image.
> > To take your proposal as an example:
> >
> > * It doesn't fulfil the canonical representation criterion,
> > meaning that different implementations will generate different
> > images. Now, this isn't as bad as with tar layers (the file data
> > blobs will be the same) but it does have an impact on image
> > reproducibility.
>
> This is not in what I have written yet yes. I see it as impossible to
> find a complete image spec everybody can use, but we can align on a
> common smallest denominator of things every implementation fills out
> the same. That will be mostly the metadata we have currently I think.
> Unless there is history which people wanted removed.
I was more focusing on the idea that you can (by design) have multiple
representations of the same root filesystem by (for instance)
rearranging the order of entries in the manifest.
I wasn't suggesting that we ship XFS (or whatever) images -- that would
be a bad idea for a variety of reasons. The idea is that we would
develop a kernel driver *for whatever format we end up using*. This
wouldn't be required in order to use OCIv2, but it would be an option
for users that want additional assurances about the code they're
running. And yes, you would only ever want to use the kernel driver for
images which are signed by a trusted vendor.
> Also thinking about the kernel driver, this sounds like what SmartOS
> did with their images, which are essentially ZFS Filesystem blobs
> saved as file and bundled with JSON metadata. The on-disk format of
> ZFS is as far as I have gathered not that complicated so it could give
> you an inspiration on how to make a filesystem format.
I use ZFS myself, and it's an awesome filesystem.
However it would be incredibly unwise to embed ZFS send payloads into
OCI (not to mention it wouldn't actually solve any of the issues we had
with tar archives -- unless you rely on servers running with ZFS
deduplication [which is often not enabled, for good reason] and even
then I would argue it still doesn't solve the primary issue of transfer
duplication). And on systems which don't support ZFS natively you'd have
to run a FUSE driver or other implementation of the ZFS format.
There's also the license issue, but I don't want to rehash that entire
debate. Suffice to say, we can't assume that all OCI users are running
ZFS.
> > Both of our formats also suffer from the issue that while they do
> > allow for reduced data transfer, they increase round-trips by
> > having many more blobs. I think it would be useful to consider
> > having the format be such that you could optimise such transfer
> > problems through HTTP multipart range requests.
>
> My hope was on HTTP2 or a transport encapsulation. As that can be a
> whole spec in itself I don't want to proposae anything into that
> direction as of yet.
There is already a spec in OCI for that -- the distribution-spec. This
is actually something we will need to collaborate with them on, so that
we don't cause issues with whatever OCIv2 proposal we have.
> > And the stargz proposal also suffers from a few similar problems
> > as well. Not to mention that all proposals have effectively copied
> > a mistake from the original image-spec -- character and block
> > devices shouldn't be included in images because major/minor numbers
> > can change between machines (and even reboots of the same machine).
> > That is something we need to either fix (in the way systemd has
> > done by leveraging /proc/filesystems) or by not allowing them in
> > images at all.
>
> Ah on illumos our /devices is a filesystem not present during the
> imaging process and only during runtime of a Zone. And /dev a set of
> forcefully overriden symlinks. I did not consider that we will need an
> exception or handling for that in other Os'es.
Practically speaking, this is also true for Linux containers (/dev is
configured by the container runtime and has a separate configuration to
the image). However you can always mknod a device anywhere on the
filesystem and if you create a tar archive, you'll get a device inode
which will be unpacked on the destination system.
> > I note that the proposal you have looks like the mtree format.
> > Don't get me wrong, the mtree format is quite useful (in fact,
> > umoci uses it internally as part of its diff generation code). And
> > I do appreciate the wish to simplify the format, though I would
> > argue that JSON is the least over-complicated or legacy-laden part
> > of the image-spec. And my experiences with mtree have shown that
> > its much better equipped as a "supplementary" manifest format.
>
> No it's not mtree but they probably share the same ancestory. It's the
> format used by the Image Packaging System. I personally have not had
> such deep experience into JSON. From what I hear it should not be
> complicated, but I don't want to just blindly step into that. I am
> happy with any formating we end up with.
I might have buried the lede with my comment -- I was responding to your
comment in the proposal that we should avoid complicating the format
(and you then mention JSON in the same paragraph). My point was just
that the encoding is probably the least over-complicated thing I can
imagine in the spec -- but it seems we agree that the encoding really
shouldn't be a blocking issue.