Docker
======
People are running in circles, screaming "DevOps". Besides over half of them also
yell "Docker" at odd turns.
Scene of the day; Baron "Xaprb" Schwartz authored an article about DevOps identity crisis [38]_
which makes it all feel this disordered. Or from the other end, at LinuxCon 2014 talk founder
and now CTO of Docker, Inc. Solomon Hykes presented [39]_:
I know 2 things about Docker: it uses Linux containers, and the Internet won't
shut up about it.
And all you want in the end of the day to all the these people really shut up and stop touting
you their magic medicine. But as long as Sylvain started dedicated topic for collecting feedback
about CherryPy's official Docker image [40]_, I was one who unintentionally started this
discussion (we talked about Drone.io test environment which is a Docker), and wanted to reply
anyway, I started investigating what the thing that causes itch not only to all the cool kids is
really about, except it eliminates *ops* as a class and cures indigestion by the way.
I have to admit that as soon as I have gotten to the original source of information, have listened
to a couple of Solomon's interviews and talks and seen how he was involved into Docker reputation
management by replying to various critical articles, I became more well-disposed toward the
project. Solomon himself looks like a positive and nice kind of guy who knows what he is doing.
Not as nice, actually, as if he hadn't rewritten DotCloud's Python code to next Google's attempt
at language engineering, though reasonably nice anyway. You may often see him saying something
like this [41]_:
I know that Docker is pretty hyped right now -- I personally think it's a mixed blessing
for exactly this reason: unexperienced people are bound to talk about Docker, and say stupid
things about what Docker can do and how to use it. I wish I could magically prevent this.
So I ought to limit myself in saying stupid things ;-)
Nature
------
Basically Docker is like Vagrant + OpenVZ on vanilla kernel. The two are available for
quite some time and have established use cases, reproducible development environment and
lightweight, operating-system-level virtualisation [42]_, respectively. Although Docker combines
the two qualities, and it's tempting to (re)think that developers can build production-ready
container images and deploy them through Docker magic powers, and that it's what Docker is
designated for, it would be a mistake to say so.
As Solomon says, Docker is mere a building block that comes in at the right time and provides
high-level APIs for set of great low-level features modern Linux kernel has. There are several
things I want to summarise from his latest interview given to FLOSS Weekly in episode 330 [43]_:
* it is not a silver bullet, it is a building block
* it is used both ways: as VM and as "binary" (process-per-container); neither is the "true" one
* it is neither superior nor mutually exclusive with configuration management
* it is not a complete production solution; production is hard and is left to the reader
* it doesn't primarily target homogenous applications; the platform's tools may be sufficient
I would agree that except first one, one might find "however" to any of them. And especially,
considering the pace of development, and promising trio of new experimental tools, that appeared
to come along Docker for cluster orchestration, in Unix-way of loosely coupled components and
responsibility separation: Machine, Swarm and Compose [44]_.
And I really like to think about Docker as a building block. No matter what the hype is, even
though it entails imbalance in available information, there're certain design decisions made
that have certain implications. Taking them into account you decide whether Docker fits your
certain task. For instance, there was a good comparison, LXC versus Docker, by Flockport [45]_,
which emphasises the difference between the two as long as Docker goes with own *libcontainer*:
* performance penalty for a layered filesystem (e.g. AUFS)
* separate persistence management for immutable containers
* process-per-container design has issues with real world [46]_:
* most of existing software is written in expectation of *init* system
* monitoring, cron jobs, system logging need a dedicated daemon and/or SSH daemon
* system built of subsystems built of components is much harder to manage
As I said above both process-per-container and application-per-container both have merit. But
because Docker originally was conceived with the former idea, most images in Docker Hub are
process-per-container and people begin to fight the tool they have chosen by using Bash scripts,
Monit or Supervisor to manage multiple processes.
Complete configuration management products, like Ansible, or even simpler Fabric-based approaches
in conjunction with abundance of API for every single IaaS and PaaS provider made available
infrastructure as code with any desired quality long before Docker appeared. Big companies were
implementing their own infrastructure-as-code solutions on top of chosen service provider. Smaller
companies mostly weren't in the game. Docker, because of its momentum, can standardise service
providers' supply and popularise infrastructure as code in the industry which I see quite
advantageous.
Another thing that shoundn't fall out of your sight is that Docker is a new technology. It has
known and significant risk for new security issues [47]_. And even in its own realm it is not
as nearly as perfect. Indeed Docker makes clearer responsibility separation between *devs* and
*ops*. But there are cross-boundary configuration, like maximum open file handles or system
networking settings which need to be set on host and have corresponding configuration in
container, e.g. MySQL table cache. There are environment-specific configuration like load testing
in QA and rate-limiting in production. There is a lot more of such things to consider in real
system.
There is a lot of discussion of microservice architecture [48]_ today. One of its aspects is that
what was inner-application complexity is now shifted into infrastructure. And the more you follow
this route the more you immerse into distributed computing while distributed computing is still
very hard. Docker guidelines go clearly along the route.
Lately I've seen a comprehensive and all around excellent presentation on distributed computing
by Jonas Bonér [49]_. Although it's named "The Road to Akka Cluster and Beyond" the Akka, which
is distributed application toolkit for JVM, part starts around 3/4 of the slides. The rest is
overview, theoretical aspects and practical challenges of distributed computing.
Last thing to say in this section is about Docker's competitors. There are obviously other fish
in the sea. Besides already mentioned LXC [50]_, there are also newer container tools built on
experience with existing tools, realising their shortcomings and addressing them. To name a
few: Rocket [51]_, Vagga [52]_.
Application
-----------
With that being said I think thorough discussion of infrastructure and distributed application
design within the scope of CherryPy project is not only a off-topic. It's a hugely out of the
context. However the topic itself is actual and interesting, but I'm sure everyone who is
dealing or is willing to deal with an application at scale knows how to run
``pip install cherrypy``.
What I already told to Sylvain, is on-topic and CherryPy QA can benefit from is Docker
QA image with complete CherryPy test suite containing all environments and dependencies so every
contributor can take it and run the test suite against introduced changes effortlessly. Here's
a *dockerfile* for the purpose.
.. sourcecode:: text
FROM ubuntu:trusty
MAINTAINER CherryPy QA Team
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get --no-install-recommends -qy install python python-dev python3 python3-dev \
python-pip python-virtualenv
RUN apt-get --no-install-recommends -qy install software-properties-common build-essential \
libffi-dev mercurial ca-certificates memcached
RUN add-apt-repository ppa:fkrull/deadsnakes
RUN apt-get update
RUN apt-get --no-install-recommends -qy install python2.6 python2.6-dev python3.3 python3.3-dev
RUN pip install detox
WORKDIR /root
WORKDIR /root/cherrypy
RUN echo '#!/bin/bash\n \
service memcached start; hg pull --branch default --update; tox "$@"' > serial.sh
RUN echo '#!/bin/bash\n \
service memcached start; hg pull --branch default --update; detox "$@"' > parallel.sh
RUN chmod u+x serial.sh parallel.sh
ENTRYPOINT ["./serial.sh"]
``hg clone`` line should be changed to the source of changes. Thus I think we should distribute
it just as a *dockerfile*. Because of development dependencies it isn't small -- ~620 MB.
I also made parallel Detox run possible, ``parallel.sh``, but I think usually running an
environment at a time is better to stress tests with more concurrency. What's really nice about it
is that it can be run just like a binary.
.. sourcecode:: bash
docker run -it cherrypy/qa -e "py{26-co-ssl,27-nt,33-qa,34-nt-ssl}" \
-- -v cherrypy.test.test_http cherrypy.test.test_conn.TestLimitedRequestQueue
Side-project
~~~~~~~~~~~~
As I said, I think that certain answers to questions of infrastructure, orchestration or
application design are off-topic inside the project. CherryPy was never meant to answer them.
But it's really nice to have all this information available nearby. Like the series of posts
Sylvain wrote in his blog [53]_.
Sylvain, I can give you a starting point if you wish. I have a CherryPy project,
*cherrypy-webapp-skeleton* [54]_, which is a complete, traditional CherryPy deployment on Debain.
I had a *fabfile* [55]_ there which I used test the tutorial against a fresh Debain virtual
machine. What I did is I basically translated in into a *dockerfile* [56]_. It's a
virtual machine style, application-per-container thing, and it's good for testing. It is also
one of possible ways of deployment, but it obviously doesn't provide means to scaling. So if
you're interested in splitting it to separate containers and providing configuration for
containers and orchestration for such an example I would surely accept your changes. You're
also free to point me if you see something is wrong there -- it's all discussable.
In case you're interested I'll give an overview of the project and at the same time address
some of your questions:
* components as-is: CherryPy, Nginx, MySQL, Monit
* containers to-be: 2 x CherryPy, Nginx, MySQL
* it runs through *init* script which calls *cherryd*-like daemon
* it is installed into a *virtualenv* but it probably doesn't make sense in a container
* we outght to adhere to distro standard directory layout (``/var/www``, ``/var/logs``,
etc.), but the extent is in question
If we will present it as an example we must also address *init* system and other issues [46]_,
in some way, I wrote above.
____