Why does rkt store container images differently from Docker?

2363 views
Skip to first unread message

Julia Evans

unread,
Oct 31, 2016, 10:12:36 PM10/31/16
to rkt-dev
Hello!

I've been trying to understand why the tree store / image store are designed the way they are in rkt.

My current understanding of how *Docker*'s image storage on disk works right now (from https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/) is:

- every layer gets one directory, which is basically that layer's filesystem
- at runtime, they all get stacked (via whatever CoW filesystem you're using)
- every time you do IO, you go through the stack of directories (or however these overlay drivers work)

my current understanding of rkt's image storage on disk is: (from the "image lifecycle" section here: https://github.com/coreos/rkt/blob/master/Documentation/devel/architecture.md)

- every layer gets an image in the image store
- when you run an ACI file, all the images that that ACI depends on get unpacked and copied into a single directory (in the tree store)


guess at why rkt decided to architect its storage differently from Docker:

1. having very deep overlay filesystems can be expensive
2. so if you have a lot of layers, copying them all into a directory (the "tree store") results in better performance

So rkt is trading more space used on disk (every image in the image store gets copied at least once) for better runtime performance (there are no deep chains of overlays)

Is this right? Have I misunderstood what rkt does (or what Docker does?) Are there other reasons for the difference?

Jonathan Boulle

unread,
Nov 1, 2016, 1:31:25 PM11/1/16
to Julia Evans, rkt-dev
Hello Julia,

Good question! Your understanding is basically correct, although the initial motivation was less to do with concerns about overlayfs performance and more due to the relative complexity of rendering ACIs.

One of the key differences between the ACI and Docker formats is that while Docker's layers are essentially a linked list, ACI dependencies instead form a directed acyclic graph, with a separate whitelist system. This means that to create a root filesystem from a Docker image and its parent layers, you can simply layer them on top of each other while respecting the AUFS-style whiteout files; whereas the process of rendering an ACI as a root filesystem is rather more complicated [1], as you need to traverse a full graph [2], and can have cases like the same image appearing multiple times in the graph but with a different whitelist affecting which parts of it should be used [3]. To compensate for this additional complexity, we "pre-render" the root filesystem that an ACI represents into the treestore [4], and then use overlayfs on top of this at runtime. 

Therein lies another distinction: Docker uses the CoW semantic for two purposes - i) to render the layers of the image rootfs, and ii) to allow the rootfs to be shared by multiple containers at runtime. In the rkt world, we use the treestore for i) and only use CoW (in our case overlayfs) for ii). Hence as you correctly identified we avoid any complicated chain of CoW during runtime. At the same time, because the rendering is still relatively expensive, we try to do it at the time an image is fetched into rkt's store rather than when a pod is run [5].

Does that help?

cheers,
Jonathan

[3] Actually it can get even more complicated since ACI dependencies can be optionally specified in a non-deterministic way, to facilitate things like updating a "base" image; but in practice we haven't seen much of this use case so it's probably simpler to ignore it for now ;-)


--
You received this message because you are subscribed to the Google Groups "rkt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+unsubscribe@googlegroups.com.
To post to this group, send email to rkt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rkt-dev/01dec07a-1093-47d2-8529-5edc4efc4682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Julia Evans

unread,
Nov 1, 2016, 2:56:54 PM11/1/16
to Jonathan Boulle, rkt-dev
This helps a lot, thanks so much for the explanation! I didn't realize that ACI dependencies could be any acyclic graph, and I didn't know about the whitelist system.

Just to make totally sure -- in the case that you're running a single ACI image with no dependencies, there's no way to get around having two copies of it on disk, right? Is there any plan to simplify the runtime environment for people who aren't using dependencies or whitelists, or is the idea that that's a pretty rare case and there's not enough demand for it to have special treatment?

Julia

To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.

Jonathan Boulle

unread,
Nov 2, 2016, 2:22:53 PM11/2/16
to Julia Evans, rkt-dev
As of today that's correct - the image is both stored in unrendered form (in the image store) and rendered form (in the treestore) [1]. However, there are two things coming in the near future that I expect will impact this: i) we are in the midst of a pretty significant storage rework [2], and ii) we are approaching the 1.0 release of the OCI image spec [3], at which point we will work towards making OCI the primary format that rkt supports and work towards eventually deprecating ACI. 

Let me know if anything is unclear - there's a lot of moving parts :-)

- Jonathan

[1]: Now that I think about it, it should actually be possible to remove the unrendered form without affecting the treestore version at all, c.f. https://github.com/coreos/rkt/issues/2890 - but as that image details, the UX is not ideal
[2]: Many more details in https://github.com/coreos/rkt/pull/2972 and https://github.com/coreos/rkt/pull/3071 - you'll note no activity for a while but we expect to pick this up again very soon

To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+unsubscribe@googlegroups.com.

Brian Lalor

unread,
Nov 2, 2016, 3:36:43 PM11/2/16
to Jonathan Boulle, Julia Evans, rkt-dev
Does OCI still support the ACI-style DAG of images, or is it a strict single-inheritance hierarchy, as Docker is today?

-- 
Brian Lalor
bla...@bravo5.org
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.

To post to this group, send email to rkt...@googlegroups.com.

Brandon Philips

unread,
Nov 2, 2016, 5:23:38 PM11/2/16
to Brian Lalor, Jonathan Boulle, Julia Evans, rkt-dev
OCI Image Spec v1.0.0 will be a strict ordered list. See the "layers" section: https://github.com/opencontainers/image-spec/blob/master/manifest.md

To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.
To post to this group, send email to rkt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rkt-dev/01dec07a-1093-47d2-8529-5edc4efc4682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rkt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.
To post to this group, send email to rkt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rkt-dev/CAPWU_0oZH_i5HwY3nGf7MN1e8T0tXEJFuJU%3D78Tsp0zEcEweOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rkt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.
To post to this group, send email to rkt...@googlegroups.com.

Brian Lalor

unread,
Nov 2, 2016, 5:42:00 PM11/2/16
to Brandon Philips, Jonathan Boulle, Julia Evans, rkt-dev
Boo. That was a great strength of the ACI spec. 

-- 
Brian Lalor
bla...@bravo5.org

Angus Lees

unread,
Nov 2, 2016, 6:35:09 PM11/2/16
to Jonathan Boulle, rkt-dev
[changing subject line]

Other than the format changing, will the shift to OCI format have any other visible effects on rkt?

In particular, I've just quickly skimmed the OCI docs and I see they use arch=$GOARCH, ie: arch=arm and not appc's arch=armv7l, etc.  Does this mean `ValidOSArch` will be changing to match the OCI labels, and related affects down through rkt-on-arm?

 - Gus
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.

To post to this group, send email to rkt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rkt-dev/01dec07a-1093-47d2-8529-5edc4efc4682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rkt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+u...@googlegroups.com.

To post to this group, send email to rkt...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.
http://www.mailguard.com.au/mg


Report this message as spam  
 

Brandon Philips

unread,
Nov 2, 2016, 7:43:43 PM11/2/16
to Brian Lalor, Jonathan Boulle, Julia Evans, rkt-dev
If you have practical use cases I would encourage you to email d...@opencontainers.org https://groups.google.com/a/opencontainers.org/forum/#!forum/dev about your use case and that you would like to see this in future releases.

Jonathan Boulle

unread,
Nov 7, 2016, 11:27:19 AM11/7/16
to Angus Lees, rkt-dev
To be clear, we do not plan to break any ACI compatibility within at least the 1.x series of rkt (so there shouldn't be any visible changes in that respect). For arch specifically I'm not 100% sure but imagine it makes sense to validate the format-specific whitelists.

To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+unsubscribe@googlegroups.com.

To post to this group, send email to rkt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rkt-dev/01dec07a-1093-47d2-8529-5edc4efc4682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rkt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+unsubscribe@googlegroups.com.

To post to this group, send email to rkt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rkt-dev/CAPWU_0oZH_i5HwY3nGf7MN1e8T0tXEJFuJU%3D78Tsp0zEcEweOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Message protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.
http://www.mailguard.com.au/mg


Report this message as spam  
 

--
You received this message because you are subscribed to the Google Groups "rkt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rkt-dev+unsubscribe@googlegroups.com.

To post to this group, send email to rkt...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages