Image squashing using docker engine api

90 views

Skip to first unread message

Ranjib Dey

unread,

May 25, 2016, 5:38:09 PM5/25/16

to docker-dev

Hello list,

I am trying to implement image squashing using the docker engine golang bindings. I am doing this to get rid of any sensitive data that may reside in the intermediate layers as part of the build process. I had earlier tried the docker-squash utility, which seems be broken since the introduction content addressable layer, hence I am rolling out my own using the go api client. My workflow is:

- Save image using SaveImage api call

- Import image using ImportImage, pass on additional metadata via changed parameter (such as CMD & WORKDIR)

After the successful import I can see that the new image exist and has the expected metadata using inspect, but the resultant image still fails to run, with docker daemon saying "Container command '/bin/sh' not found or does not exist"

At this point I am not sure whats missing, or what I am doing is just silly, I have went through a handful of issue threads, and the PR comments that implements "changes" parameter in import, and I think this should work,

any pointers will be very helpful, including calling it if this is a bad approach,

Link for the implementation prototype: https://gist.github.com/ranjib/d9f9798393a25971282de9e4093e49f2

Relevant github issues:

https://github.com/jwilder/docker-squash/issues/45

https://github.com/jwilder/docker-squash/issues/44

https://github.com/jwilder/docker-squash/pull/15

https://github.com/docker/docker/pull/9123

thanks in advance

ranjib

Rich Moyse

unread,

May 27, 2016, 9:24:51 AM5/27/16

to docker-dev

ranjib:

There are many reasons to dissuade you from creating a squash operator to achieve your stated goal of eliminating secrets from your image. Here are a couple:

Unless "squash" can be described as "not X" where "X" identifies the set of files you wish to retain then this is a very problematic mechanism for eliminating unnecessary components. Also, "not X" must be automatically generated from "X" via a well tested algorithm. Unfortunately, a number of "squash" implementations rely on manually generated linux rm commands or specifying the layer to remove. These implementations depend on a developer's knowledge of the buildtime tooling and intricacies of Docker's image/layer management. Therefore, when the implementation of any build tooling changes or Docker adapts its image management internals, you must identify how these changes potentially affect file names, layer locations of the buildtime artifacts in order to, when necessary, properly rewrite the "squash" operator's file/layer removal list. There's also the obvious shortcomings of mistakenly removing certain desired files while at the same time, potentially preserving undesired ones, if you can't strictly define "X" and "not X".
Since "squash", in the form you seek, isn't available via a public interface maintained by Docker, implementing this operator will require coupling to Docker's implementation, resulting in brittle code that's susceptible to changes in Docker's image management implementation. For example, Docker recently changed how it internally manages images in release 1.10. During this transition to content addressable images/layers, even some of its own public interfaces, like the Docker CLI, where decidedly impacted.

Assuming "X" is known, instead of building the final image via exclusion (squash), assemble it via inclusion. One could use the following public operators to essentially copy out/in the "X" needed to achieve your goal: docker cp,docker export, and docker import then use docker commit to specify the desired metadata settings. Composing these public operators results in code resilient to Docker tinkering with its internal image format. Since I also encountered issues similar to yours, I created a Bash script call dkrcp that encapsulates the previously mentioned public operators providing a means to create or evolve an image by copying files from containers, host file system, other images, and/or streamed tar. Although this scheme doesn't directly mesh with Docker's Automated build environment offered by Docker Hub, this concern seems irrelevant in your situation, as you're probably not building images containing secrets on public servers. That said, Dockerfiles and Builder could be used to create all the resources needed to assemble the "Final" image, and perhaps through clever use of web triggers, the scripting necessary to build the Final one could be automated.