Ugly systemd units with fleet

Darren Shepherd

unread,

Jul 9, 2014, 11:18:20 AM7/9/14

to coreo...@googlegroups.com

Ideally I'd like to be able to spin up a clean CoreOS cluster running fleet and then just start scheduling units and magic happens. What I'm finding is that I often need a bit of scripting before my unit kicks off my docker container. Even just running a simple docker container needs some scripting because you do something like "docker start -a || docker attach || docker run" (yeah... launching docker from systemd is neither fun, easy, or reliable, but that's a whole different thread).

So I'm find that I'm creating really, really ugly unit files. It would be much easier to schedule a unit that has ExecStart=/.../bin/myservice.sh. But of course that script doesn't exist unless I put it on the server.

If you look at something like Marathon, when you schedule your process you can associate data with it, like a tgz file. It would be nice if fleet could do the same. So before fleet runs the unit it would extract the tgz file to a known place. I'd like fleet to store the data, so I don't have to host it at some URL. This means the data would have to be quite small since it would go into etcd, but I can live with that. It will be like user-data/cloud-config.

Darren

Blake Mizerany

unread,

Jul 9, 2014, 11:36:01 AM7/9/14

to coreo...@googlegroups.com

Darren,

I agree 150%. There is no timeline around something like this - that I know of and can't promise there will be for some time. I am willing to go through the basic mental exercises needed to do something like this though.

Can you describe the semantics/flow around what this might look like in some more detail?

I'm curious to hear your thoughts.

On Wednesday, July 9, 2014, Darren Shepherd <darren.s...@gmail.com> wrote:

Ideally I'd like to be able to spin up a clean CoreOS cluster running fleet and then just start scheduling units and magic happens. What I'm finding is that I often need a bit of scripting before my unit kicks off my docker container. Even just running a simple docker container needs some scripting because you do something like "docker start -a || docker attach || docker run" (yeah... launching docker from systemd is neither fun, easy, or reliable, but that's a whole d ifferent thread).

Seán C. McCord

unread,

Jul 9, 2014, 2:17:18 PM7/9/14

to coreo...@googlegroups.com

A thought I have frequently considered here is to specify a wrapper container/image which would have a standard format for environment loads and entrypoint. From there, it would launch whatever desired unit with that wrapped information. The first problem is coming up with a structure that is general enough to serve any generic fleet service (many of those, for me, do not even necessarily launch docker containers). I don't feel I have enough exposure yet to properly describe that scope.

--
Seán C. McCord

ule...@gmail.com

CyCore Systems

Darren Shepherd

unread,

Jul 9, 2014, 2:28:04 PM7/9/14

to coreo...@googlegroups.com

I was thinking something like a subset of cloud-config. I want to avoid something like giving a URL to a tgz/zip file. The reason for this is that I see quickly people using this as a means of doing something like download the python runtime and there app and just running it from the host. You still want to encourage people to use containers. So the intention here is really to just provide a simple way in which people can provide bootstrapping scripts. Here's what I'm thinking. Honestly I've put all of 15 minutes of thought into this, so feel free to disagree.

You can do

fleetctl submit/load/start --config unit-config myservice.unit

The --config file is then a valid cloud-config file but only a subset of cloud-config directives are allow. To start the allowed directives will probably just be write_files. Since this is a valid cloud-config file, you could also just do "#!/bin/bash" in the file and it will just run as a script.

Now the problem I see with this is that if people are writing files to the host as units are ran, how do they clean up afterwards? So what I was thinking, and this is a bit hand wavy, because I don't fully know all that systemd can do, is that you use a systemd fragment to augment the users units such that you add a unit private mount /my-containers-private-crap. I know systemd can give units their own private /tmp, I don't know if it can generically do any tmpfs mount. The fragment also add something like ExecStartPre=coreos-cloudinit --allow write_files,something_else /run/.../the-units-cloud-config. Then whatever the users config file does would essentially be ephemeral to the invocation of the unit. Unless they write stuff to /opt or wherever. At least they have a directory that is ephemeral. I guess you could do it by convention too and just have a special directory in /run that always gets deleted.

Darren

Jonathan Boulle

unread,

Jul 9, 2014, 7:13:46 PM7/9/14

to coreo...@googlegroups.com

On Wed, Jul 9, 2014 at 8:18 AM, Darren Shepherd <darren.s...@gmail.com> wrote:

launching docker from systemd is neither fun, easy, or reliable, but that's a whole different thread).

I'm not sure I'm quite convinced that this is a separate issue. If docker integration was seamless (and automatically garbage collected), doesn't it provide exactly the kind of thing you're describing? A self-contained, self-describing environment that's fully usable with a single entry point; what else would you really need to bootstrap?

I was thinking something like a subset of cloud-config.

Could you explain what other kind of things you would want to achieve with this beyond your write_files examples? (Since again I would assume those could be handled within the container, either during the container build or by whatever the entrypoint is).

Darren Shepherd

unread,

Jul 9, 2014, 8:15:27 PM7/9/14

to coreo...@googlegroups.com

write_files is the main use case.

On Wed, Jul 9, 2014 at 4:13 PM, Jonathan Boulle <jonatha...@coreos.com> wrote:

I'm not sure I'm quite convinced that this is a separate issue. If docker integration was seamless (and automatically garbage collected), doesn't it provide exactly the kind of thing you're describing? A self-contained, self-describing environment that's fully usable with a single entry point; what else would you really need to bootstrap?

Well, docker/systemd integration is not seamless and I see no solution that will make it so in a reasonable amount of time, so I don't know what to say. It's kind like saying, "If it did everything you needed, wouldn't that be sufficient?" I'm not trying to be obnoxious, its just that relating a docker container to a systemd service today doesn't fit well at all. In a separate thread I can enumerate all my issues. You can see a small bit of my frustration at https://github.com/dotcloud/docker/issues/6791

Could you explain what other kind of things you would want to achieve with this beyond your write_files examples? (Since again I would assume those could be handled within the container, either during the container build or by whatever the entrypoint is).

write_files is the main use case. I don't think I follow what you are saying about "container build". The point of my comments is that often I'd like to call a simple shell script for ExecStart, not just "docker run" and in that shell script I will eventuall call docker run. Maybe I'm alone here on this one. For example. Say I do "docker run --name myservice someimage:42". Now later I want to deploy the same service built off of someimage:43. What I'd like to do in the start up is do "docker ps" and see if the container exists and is using an old image, if it is delete it and create new container from the new image using the same data volumes.

Darren

Jonathan Boulle

unread,

Jul 9, 2014, 11:01:06 PM7/9/14

to coreo...@googlegroups.com

On Wed, Jul 9, 2014 at 5:15 PM, Darren Shepherd <darren.s...@gmail.com> wrote:

Well, docker/systemd integration is not seamless and I see no solution that will make it so in a reasonable amount of time, so I don't know what to say. It's kind like saying, "If it did everything you needed, wouldn't that be sufficient?" I'm not trying to be obnoxious, its just that relating a docker container to a systemd service today doesn't fit well at all. In a separate thread I can enumerate all my issues. You can see a small bit of my frustration at https://github.com/dotcloud/docker/issues/6791

Oh, I totally get your frustration. (That issue is a good summary of the key problems so it'll be interesting to see what comes of it).

The point I was more trying to make is that it sounds like you're trying to script around things which IMO should really be solved in some dedicated component: either the init system itself, the container environment/runtime, or maybe some other higher-level service (fleet, gantry, kubernetes, whatever). Otherwise what starts off as a simple bash script ends up rapidly evolving towards such a system itself.

write_files is the main use case. I don't think I follow what you are saying about "container build".

I just meant that if you are writing files into your container then why not push that back into the build process of the container? An example would be helpful, maybe I'm still not understanding your specific use case.

Maybe I'm alone here on this one. For example. Say I do "docker run --name myservice someimage:42". Now later I want to deploy the same service built off of someimage:43. What I'd like to do in the start up is do "docker ps" and see if the container exists and is using an old image, if it is delete it and create new container from the new image using the same data volumes.

I'm a bit confused by this example because it doesn't map to the init system: if myservice is controlled by a single systemd unit, then the previous container's existence should be tied to the life of the unit - in which case it doesn't make sense for the ExecStart of that unit to check for previous containers. But if myservice-42 and myservice-43 are separate units, then what they should really check is whether the old unit is running.

But maybe you're just describing a workaround to the current systemd/docker awkwardness, in order to achieve something close to in-place upgrades? In which case, to come back to my earlier point, it seems like you're implementing a lightweight init system in a startup script. Once the systemd unit isn't tracking the life of the service, then what happens if it spontaneously dies? how do you cleanly/consistently expose what is running on the system? etc, etc.

Hopefully some of this makes sense and I'm not totally missing where you're coming from.

Also, somewhat tangentially, you could always explore the systemd-nspawn route for actual process execution: despite the "debugging" disclaimers it is very powerful, reliable and integrates nicely as a systemd unit (as one would hope), and it's not totally crazy (or at least infeasible) to use it with docker images (see: toolbox)

Darren Shepherd

unread,

Jul 10, 2014, 3:21:10 AM7/10/14

to coreo...@googlegroups.com

Your analysis is spot on. I'm basically just addressing awkwardness between systemd and docker. I retract any suggestions for fleet I've made thus far. Let's step back a bit and address the real core issue. I'm sure you're aware of the hiccups between systemd and docker, but for those reading this, I'll just enumerate a few issues I can think of at the moment. Lets walk through a basic scenario. Say your new to systemd so you refer to the docs at https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/. That will tell you to create a new unit file with essentially the below line

ExecStart=/usr/bin/docker run busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"

You "systemctl enable/start" and now you've got Hello World running at 12 CPS in a docker container. That's great, now lets stop the unit with "systemctl stop hello.service" and... oops... it doesn't stop. Why? That's because of SIGTERM handling. If your running bash as your PID 1, there's a good chance your not handling SIGTERM properly. Okay, no big deal, not the first process to not stop on SIGTERM, lets just do "systemctl kill --signal=KILL hello.service". Your unit is now stopped but wait... "docker ps" shows the container is still running. Why? Well herein lies one huge issue. systemd is supervising the docker client, not the container itself. The state of the service unit does not necessarily reflect the actual container state. At this point you're basically screwed. But lets ignore the gaping hole in the bottom of the boat and go back to fixing the sails.

You've now figured out that when you start your unit there's a chance your container is still hanging around. If you start your unit a second time, you'll get two docker containers running. So you get clever and give your container a name. This allows you to do "docker attach ... || docker start ... || docker run ..." to handle the cases where your container is running, stopped, or does not exists. Awesome, problem solved. Now you realize that you need to add "-e MYVAR=VALUE" to your container. Now if you edit your unit file it should now don on you that you can't actually do "docker attach/start" because your named container has the wrong values. Instead you now need to do something like "if container exists; then docker rm -f container; fi; docker run ..." but how do you shove that into a unit file. I could go on and on, for example, if your "docker run" needs to pull an image you end up with a unit that is "active/running" but is in fact doing nothing for however long it takes to download your image which could be a long time if the index/registry decides to be slow today.

As you can see we've basically taken systemd with its "services are so easy to run, no double forking, no pid files, etc" right back into the SysV days of "oh crap, I didn't think of that corner case now I can't start my init script because my service is really running and I need to kill my service first so that I can restart my init script" days. So, yeah, systemd and docker are not a nice fit.

Now I love CoreOS. I love the vision of CoreOS (and the graphics, seriously, really nice pictures). Center to CoreOS's vision is containers and Docker. Docker and CoreOS are like PB&J. Well, they are on paper. But let's be honest, Docker isn't really a first class citizen in CoreOS. The first class citizen, the darling in CoreOS's eye, is systemd. And docker is secondary. All things in CoreOS revolve around systemd. The cloud-config can write and manage units, fleet is essentially a systemd cluster scheduler. There's practically nothing that I see in CoreOS that is actually tailored to Docker except that its included in the base system. Now I'm not saying this to criticize CoreOS, I'm merely pointing out that CoreOS is built around systemd and the use of docker in CoreOS almost solely depends on systemd's ability to launch docker. But, as I pointed out before, systemd can't monitor the actual state of your docker containers. If systemd can't monitor Docker, and fleet relies on systemd, if I do my math correctly, fleet can't reliably schedule docker containers. This all seems fundamentally broken. Again, I love CoreOS, I want this to work, I speak merely out of frustration of a user who desperately wants to use CoreOS.

So what do we do? I see three paths.

1) Make Docker first class

2) Build a component to bridge system and docker
3) Make systemd and docker play nice with each other

First, make docker first class. What I mean by this is that instead of fleet calling systemd, it actually calls docker APIs. So make fleet capable of running both unit files and docker containers naively. But what does this mean for cloud-config? Add a directive to cloud-config for docker? If you do that then who will supervise the Docker container. You could have fleet running locally to supervise it. But then you have systemd, fleet, and docker itself all do its own supervision. This seems messy....

Second, build a component to bridge system and docker. You could write a docker wrapper such that instead of ExecStart=docker run, you do ExecStart=docker-wrapper run. Then magically this docker wrapper handles all the crazy corner cases. This would be a hack and a last resort.

Finally this brings us to making systemd and docker play nice. I've been watching this from a far and frankly I'm not too happy with what I see. systemd and docker have a huge overlap in technical capabilities but both are on a quest for world domination. systemd is the incumbent that's be containerizing things way before docker was even born. Docker is the new hot up and comer on the block that can't afford to compromise it's vision so early in the game. If you put systemd and docker in the same room and ask them to find a solution they'll argue on principles and visions and nothing really comes of it. Meanwhile users are suffering. So I'd like to step in as the ambassador of practicality and personally point the finger directly at CoreOS. CoreOS is in the position to find a happy compromise. CoreOS has a vested interest in both systemd and docker being a success. I don't expect or really want systemd and docker to happily merge into one thing and hold hands and skip down the yellow brick road. They will most like stay largely independent and overlapping in nature. We just need to find simple solutions that address the immediate issues and running docker from a service unit is a huge issue.

So I'd like CoreOS to not pick sides and instead fight on behalf of the user. If I had to guess I would say that CoreOS is slightly picking sides with systemd right now and/or waiting for Red Hat to fix this. I'm really not an expert in all the tech involved with systemd and docker but from what I see the following things could make the integration nicer.

1) Make containers run in the same cgroup or as a child of the cgroup of the systemd unit. This should make systemd able to actually monitor the container (in combination with the pid)

2) Make the docker client aware of sd_notify and after launching the container call the equivalent of systemd-notify --pid=... to notify systemd of the pid

3) Make the --rm flag be implemented by the docker daemon. Currently, AFAIK, --rm is done in the docker client making the --rm not really work all the time. So if you kill -9 the client, the container will stay.

If we have those three things I think it would be possible to just have a simple ExecStart=docker run ... in your unit and all will be swell.

Darren

Alex Polvi

unread,

Jul 11, 2014, 10:26:36 PM7/11/14

to coreos-dev

Darren,

Thank you for this discussion and your support.

To make systemd play nice with docker, actual/fork-exec of the process
needs to happen as a child of systemd. Docker wants to own the fork,
for technical reasons I cannot understand. This mess is a victim of
that design decision. We have known it since the day docker shipped on
CoreOS.

A little background on how much we care about this: During the Red Hat
conference earlier this year philips and I pulled together a meeting
between shykes, crosbymichael, alexl, dwalsh, gregkh, and a few folks
from the project atomic team. We talked about this issue directly,
trying to solve this problem for our mutual users. The outcome (which
the docker team agreed to and may still happen) was that docker would
be modified to have a "standalone" mode where systemd could fully
supervise the process. This work goes deep into the core of docker.
While our very own philips is a top contributor to docker, we felt
that modifying the core of docker is something that is best left to
the core docker team members. To this end, crosbymichael began
building this feature, but my understanding is that it was
de-prioritized before the release of 1.0 and dockercon.

So, this leaves us with two options for a clean integration:
1) CoreOS sponsors the development effort to patch, very deeply,
docker to play nice with systemd by not owning the fork/exec. This is
definitely not out of question. This would make systemd happy, and in
turn fleet happy, and thus our users. Perhaps the outcome of this
thread will push us to step-up and do this work.
2) Modify fleet to support talking directly to the docker APIs,
skipping systemd all together. This is not at all out of question, and
we are open to doing this if it is the right thing for the user.

Lets play with the idea of fleet talking directly to the docker API
for a minute, skipping systemd all together. To do that we'd have to
do a few things:

First, we'd need to put a cleaner API on fleet which is not systemd
specific. This work is underway already, so no problem there.

Next, we'd need to express the docker api in the fleet api. This is
where it gets a little tricky. Lets think about how the end user would
handle it for a minute. Would it look like this?

$ fleetctl docker run --rm foo/bar
abcdef1234
$ fleetctl --conflicts=abcdef1234 docker run --rm foo/bar
dfecbd0979
$ fleetctl docker stop abcdef1234
abcdef1234

etc?

Do you think this API would solve your problems? Is this what users
really want? Is there something better? Maybe something like this is
better done in libswarm?

Our current thinking is that users want to describe a desired state of
a set of running containers, and the distributed system will make that
state so, making assumptions for the user. Very much like kubernetes
has showed us, and that's why you have seen us active in that project
recently. When you think about the distributed application like this,
the user can forget about the start/stop verbs and all the complexity
associated with them. Let the system handle it. If the system handles
it, we can abstract out all the --rm --name stuff away.

So I guess to recap:
- To fix systemd/docker, we need the docker runs fork/exec to be a
child of the invoking process
- Make the end user cluster management API not about start/stop verbs
and instead about describing desired state, work that has not been
incorporated into fleet yet... yielding an incomplete solution.

Please keep this thread going and let us know what you would ideally
want so we can help make it happen. Thanks again for your support,

-Alex

Darren Shepherd

unread,

Jul 14, 2014, 11:01:21 AM7/14/14

to coreo...@googlegroups.com

Alex,

Thanks for the response. This brings up a lot of points I'd like to discuss, I'll reply at length to this email in the next day or so.

Thanks,
Darren

Darren Shepherd

unread,

Jul 16, 2014, 12:12:15 AM7/16/14

to coreo...@googlegroups.com

Alex,

Thanks for responding and I'm glad we can have this discussion. I'm having very similar discussions with the Docker folks in parallel, so don't just think that I'm picking on CoreOS. I'm encouraged at the responses I've been getting and I think we can all come to an agreement. Your email touches on so many points that you'll have to bear with me if this email goes a bit long. I'll address this is in two main parts, first systemd/docker integration, and second fleet.

Red Hat is in a really crummy situation from what I see. I really empathize with them. As the “owners” of systemd, SELinux, and OpenShift they had all the technology they needed to accomplish their goals. I'm sure they were quite content with the path they were on but then this cute little hack called Docker came onto the scene and captured the hearts of the users. They have no choice but to be involved and use Docker, but honestly Docker is far more immature than what they had and additionally it's design does not play well with their current architecture. With the current investment they have they'd be silly and stupid to not try to leverage as much as they can from systemd, SELinux, etc. The path that Red Hat has chosen and their fight for systemd seems somewhat natural given their circumstances.

Whenever some new technology comes around there's always a natural reaction to look at the current technology you have, move some pieces around, and declare it just as good that the new shiny technology. Why, is this? Because there's no such thing as new technology these days. It was all invented many, many years. Today we don't innovate with technology, but with paradigms. And this is why the shiny new technology is better, it was assembled and tweaked under a new paradigm. As you shuffle around the old technology to match the new, you dissect and normalize the paradigm and often lose the essence of it. So while systemd and Docker are both assembled on the same building blocks, there is just enough of a paradigm shift with Docker that it has really resonated with users. You really need to respect that. It's potentially dangerous to mold Docker to fit systemd because, in the process, you could lose the essence of what made it great to begin with.

I feel this is one of the difficulties of moving Docker to a standalone architecture. It's not just the technical effort, it's more about how this change could compromise the vision of Docker. From the perspective of Docker you're trying to fit a legacy paradigm while Docker is trying to move forward with their new view of the world. While technically feasible, the subtleties in the design of Docker are the tricky part, but those subtleties are the magic of Docker. You can't really fault Docker for their desire to own the fork/exec. It is the natural approach to have a daemon that spawns and manages its children. systemd does the same thing, it just happens to be PID 1.

I'm currently working with the Docker folks to fully flush out the design of standalone mode. I think at this point, it’s essential that we fully explore this option with real intent of implementing it. I think there could be some advantages to this model, but if, and only if, it stays consistent with the vision of Docker. If I feel it's an unnatural distraction in the progression of the Docker architecture I will be opposed to it.

While both sides may have their ideal solution, my gut feeling is that there will be a very simple compromise that will get 80% of what both sides need. Then later you'll find out the other 20% wasn't really all that important. I'm also pursuing with Docker what I think would be a short term solution also.

Should fleet talk directly to docker? Absolutely. Imagine if the current landscape today was that we didn't have systemd but instead Upstart and SysV. If you were going to build a container orchestration tool, do you think you would have opted to have tight integration into Upstart? Probably not. It just happens to be that systemd includes so much container oriented functionality that as architects and programmers we feel there must be a way to happily merge these twos. Now having said that, I don't think fleet should stop managing systemd units either, but more about that later.

To conclude on the systemd/docker integration path. I'm glad to hear you guys are open to different approaches and I'm actively working to resolve this and so I think we can join forces. But... if I was to trust my spidey sense, I'd say that systemd and docker will largely stay independent. Time will tell.

Now to fleet.

If you were to ask me what is fleet I'd say fleet is a systemd scheduler. I'm sure it has a grander vision than that but from an outside perspective that is what it appears to be. I struggle with the usefulness of that. I think the easiest way to describe why I struggle with that is to propose a slightly different vision for fleet. Instead of saying fleet is a scheduler or an orchestration tool, I would propose that it's role is to be the CoreOS cluster manager. It's the brains that glues everything together everything. If you create a node all you should need to do is join it to fleet. Then, based on the configuration in fleet the node should be configured and tasks deployed to it. This means fleet should pull together and offer up to the users more first class concepts than just systemd units. It should mergeor wrap functionality from cloud config, confd, systemd management, docker management, concepts from Kelsey’s ipxe profiles, etc.

CoreOS should be the ideal platform for running containers. It comes with all the plumbing needed so you can largely focus on your workload. But... that doesn't mean CoreOS should be heavily involved in scheduling/orchestration right now. Eventually, maybe, right now no. I say this for two reasons. First, for smaller size clusters, the scheduling is quite simple and so regardless fleet should work for smaller setups. For a larger size clusters people are going to gravitate towards a higher level scheduler or orchestration system like Mesos or Kubernetes. Second, if you get heavily into a scheduling/orchestration it usually come with some opinionated view on how to do things. For example, Kubernetes introduces pods and a different networking model.

Taking these points into consideration I think that instead of focusing on scheduling/orchestration, fleet should be a very unopinionated tool that can bootstrap all the other frameworks. Because of the features built into fleet you can easy run a command and have Kubetnrtes, Mesos, or Deis running fully containerized and managed. Not like you have today where you’re largely dropping files in /opt and the writing unit files. In the end, CoreOS will be the best OS to run all these systems or if you have basic needs you can just use bare CoreOS.

The ability to bootstrap these other systems requires a bunch of functionality Docker and CoreOS don't have today. Most people think of containers as a thing to run an application. But there's a different mode that I'd like to call system containers. The basic idea is that you create a container that shares namespaces with the host or other containers to add services to them. Example use cases would be monitoring tools, security frameworks, software defined storage and networking components, or, most important of all, agents for a higher level orchestration tools. The scheduling of these containers are quite different too. Most of the time you just want to define one template and then say instantiate this template once per node.

So to some this all up, my opinion is that

1. Fleet should be the cluster manager that gives you a single point to configure and control your nodes.

2. Fleet should expose more first class concepts than just units files. Pull in concepts from cloud config, confd, ipxe profiles, etc

3. Fleet should provide basic scheduling needs for small simple clusters

4. Fleet should be able to easily bootstrap high levels orchestration tools.

I think if you focus on making fleet more like a tool like kubernetes you doing a disservice. You don’t need to compete or provide an alternate perspective but instead encourage the wide ecosystem that is growing and ensure that CoreOS is the best platform to run all of it.

Darren

Kyle Mathews

unread,

Jul 16, 2014, 6:48:44 PM7/16/14

to coreo...@googlegroups.com

Thanks for your thoughts and efforts around this Darren — really upped my thinking.

FWIW, I agree with Darren that Fleet would best suited as a cluster bootstrapping tool. Kubernetes fits my needs like a glove so I wouldn't use Fleet much anyways.

Kyle Mathews

Anand Patil

unread,

Jul 17, 2014, 10:44:38 PM7/17/14

to coreo...@googlegroups.com

Hi everyone,

I just wanted to put in my two cents in support of fleet as a distributed systemd. I agree that fleet should ultimately be more than that as well, and that someone needs to do something about systemd-docker integration. However, I think some major upsides of systemd sometimes get overlooked. As you guys say here:

Systemd has an extremely rich syntax that can describe the attributes of a particular service. Your services can express hard or soft dependencies, the order of launch relative to those dependencies, and identify conflicting services. Docker containers are much easier to manage when you can specify whether they automatically restart per container and customize the timing for restarting.

That's incredibly useful: I get Wants, Requires, Before, After, post-stop cleanup, restarting rules, timers and so on. That's miles ahead of what anyone else is offering. For example, IIRC containers in a Kubernetes pod start at the same time and restart if they go down, which in systemd terms means they have Restart=always and very limited configurability outside of ExecStart. I don't mean to bash pods, I think they're a great addition to the container toolchain, but they just don't compete with systemd on fine-grained coordination capabilities. Why should I have to roll my own logic to coordinate things within pods when I could just hand the job to systemd?

Another advantage of systemd is that it gives me an escape hatch when containers aren't suitable. I'm completely on board with the value of Docker; I think dependency isolation is wonderful, love the versioning and love the easy mount, namespace and cgroups configuration. I don't think we all need to run every single thing inside a container, though. If you need dependency isolation but not the other features, a statically linked Go executable might make sense as a lighter-weight alternative. (I think it would be great if fleet could transport such executables to CoreOS machines for me). Also, there are occasional cases where you need to mount something to the host, which currently can't be done from within Docker.

Anand

Darren Shepherd

unread,

Jul 21, 2014, 12:15:07 PM7/21/14

to coreo...@googlegroups.com

Just in case it is not clear, I still very much want fleet to be able to schedule systemd units, but it doesn't need to be a one trick pony. We've started off with giving the user the "advanced" option which is a swiss army knife that you can do anything with, as long as you know the magical knobs to turn. Such a generic interface does not suggest to the user how and what one should do with CoreOS.

Fleet is extremely young and the use cases it supports are very limited. As it matures, the expressiveness of systemd may end up being more of a hindrance if not properly considered now. The directives there were mentioned, Wants, Requires, Before, After, are all things that can impact scheduling. Going forward how should fleet doing scheduling. Should the scheduling of the units really be a combination of the X-Fleet directives plus the Service directives? If I have unit A and unit B requires unit A, then it would seem that A and B should be scheduled on the same host. That seems logical today.

Docker, in its current form, is oriented towards one server. The first question most people ask once they have containers deployed is "How do I get my containers to talk to each other?" The problem is greatly exasperated when the containers do not exist on the same host. As the Docker ecosystem matures we absolutely need to have a more straight forward and de facto answer regarding container interconnectivity. When this issue is addressed, the exact locality of containers should be less of a concern than it is today. Containers should be able to communicate amongst themselves regardless of the host on which they reside.

Now lets reconsider the example of "Requires." A user wants to indicate that container A must be started with container B. Given that container interconnectivity is a solved problem, container A and B do not need to be on the same host, and most likely shouldn't. Using the systemd directive Requires means that A and B must be on the same host. Since systemd is fundamentally oriented towards managing a single server, it's directives all hold this assumption. So it seems that Fleet should have in its X-Fleet section it's own Requires that works across servers. Following this logic you then end up having Fleet duplicate all systemd directives to provide a cross server experience.

When I go down this road of thinking, I end up back at the point that I think fleet should present to the user a higher level of abstraction that is oriented toward scheduling container. That interface will carry with it its own language to describe scheduling requirements.

Darren

Anand Patil

unread,

Jul 21, 2014, 6:17:44 PM7/21/14

to coreo...@googlegroups.com

Darren,

Good points. You make a convincing case that mixing systemd's local scheduling with fleet's cluster-wide scheduling is delicate. It's also true that, when docker grows some kind of coordination, locality will not be an issue for some components.

My main goal was to point out something orthogonal: systemd is really nice for host-level factorization of services, and that's a big advantage that fleet currently provides that shouldn't get thrown out with the bathwater. It sounds like we might be in agreement on this, but I think it's worth emphasizing anyway and so will be more specific.

Pervasive containerization favors factoring services into small orthogonal components. Sometimes these components don't need to care about locality, but sometimes they do, which is the motivation for eg Kubernetes' pods. For example, if I need to write a containerized database's contact info into etcd for some kind of service discovery, it's better to factor the announcer out of the database container and into a sidekick. That way the database can be agnostic to service discovery, the service discovery mechanism and db can be upgraded independently, and the sidekick can clean up properly even if the database container undergoes an unclean crash.

Here are some other examples of host-level service factorization:

- I want to health check a web server. It's natural to factor the health checker out of the server's container and run it in a timer sidekick that starts after the server comes up and stops when it goes down.

- For monitoring, I need to capture postmortem information about a containerized service's main process. It's tricky to do this reliably inside the container because it might be killed with sigkill or something; more natural to put it in ExecStopPost.

- A monitoring service needs to open an ssh tunnel to a data store. I can open the tunnel in a service that runs before the monitor starts.

- A web server needs some kind of network attached storage. I can mount the NAS to the host in a service or mount unit that runs before the web server starts, then mount the host volume into the server's container.

There are other ways to do all these things, but systemd is a really nice way in my book.* It's declarative, modular, extremely flexible and at least as reliable as anything I'm going to write myself. It wipes out a whole slew of potential problems at host level.

* Admittedly unit files are ugly and abstruse, but that's a minor problem compared to everything else. If desired they can begin life as friendly json or in-memory objects:

{

Unit: {

Description: "Helloworld service"

},

Service: {

ExecStart: "/usr/bin/echo 'hello world'"

}

and get 'compiled out' to ini format just before deployment.

Anand

Dustin Spicuzza

unread,

Jul 25, 2014, 1:55:37 AM7/25/14

to coreo...@googlegroups.com

Hey all,

TL;DR: Make it super simple to run containers properly and handle the edge cases. Once I have that, let's talk about how annoying service discovery is.

[introduction, you can skip this] I'm super new to CoreOS (about a week now), and I've been really excited to work with CoreOS/fleet/systemd, setting up an initial CoreOS cluster was a piece of cake. I'm working on a research prototype with a large team, and over the winter, I wrote a cluster orchestrator to manage a large cluster of openstack VMs + chef. I'm doing some initial experiments with CoreOS + docker, and I'm definitely thinking that the direction CoreOS is taking aligns nicely with where we want to go, and that fleet will allow us to much more than we were doing before.[/intro]

I think I agree with most of the sentiments above. In particular, I'm quickly learning about all of the corner cases in launching containers that the web examples just don't cover yet, and am starting to build init.d-like scripts to avoid those problems -- which is (as Darren points out) exactly the kind of thing we're supposed to be avoiding! Here's what I think I want right now:

Easily deploy a bunch of containers to a cluster *correctly* without having to write a lot of tricky shell scripts OVER AND OVER again

If I get HA and load balancing out of it, thats even better (which I think I can mostly get from fleet now)

Connect them together via etcd or whatever without having to write yet more ridiculous shell scripts (separate topic)

Honestly, I'm not sure if a docker standalone is the right way to go about #1. As Darren expressed, it's good to be thinking about things differently. BUT, giving fleet the ability to do simple docker container deployment correctly seems like a win to me. Right now, I would want something like this that took care of the start/run/naming/environment/etc:

[X-Fleet]

X-docker-run=-p 1000:1000 containername arg1 arg2

Of course, I'm still new to this. Maybe I don't know what I want. But I like that better than what I have now. :)

Dustin

Reply all

Reply to author

Forward