how to handle changes in cloud-config file on live production cluster

Jean Mertz

unread,

Apr 24, 2015, 1:47:08 PM4/24/15

to coreo...@googlegroups.com

I’ve been working on preparing a production-ready cluster setup for our company
for the last couple of weeks.

Things are looking great, and I feel like we are getting close to launching our
“production” cluster with the current setup.

Our current setup is to use the cloud-config file to prepare each CoreOS machine
in the cluster (running on OpenStack), doing things like preparing ephemeral
storage, users, ssh keys, flannel networking and launching a couple of Systemd
service files used to prepare the machine.

After that, I have setup a very basic Ansible configuration to synchronise all
the systemd unit files across the hosts and add/remove ssh keys when people
join/leave.

For cluster management, we use Fleet of course.

Now, the thing I worry about most are distributing changes to the cloud-config
file, and to unit files.

I see three possible solutions to this:

1) swap every machine in the cluster whenever a minor change is made to the
cloud-config file.

This seems highly unpractical and makes migrations slower and slower for every
new machine we add to the cluster. Of course, the plus side is that you
always know your machines are in pristine condition and no anomalies exist
in your cluster.

2) use Ansible to re-run changed cloud-config files

This means I have to run coreos-cloudinit whenever a change was made.
However, this will only support forward changes. F.e. removing an ssh key
does not remove it from the machine, unless we remove all keys before
rerunning the cloud-config file.

This also brings me back to the Chef/Puppet dark days where there is the
constant possibility/fear of out-of-sync machines (different configs, ssh
keys, whatever) without you noticing it.

3) keep cloud-config as simple as possible (already trying hard to do this) and
accumulate changes until you are ready to do another “full sweep” of swapping
all cluster machines (as mentioned in 1.)

This feels like the most natural fit. But I also feel like the first months
will still see a high churn rate on the cloud-config file.

Now, changes to unit files is another concept I haven’t quite gasped yet. If
we find a bug in one of our unit files, or we need to add a new environment
variable to one of our Docker containers, we need to update the unit files on
all machines, destroy them, re-add, restart the affected units and possibly do a
systemctl daemon-reload. This again feels like mental burden which I want to
minimize as much as possible, and make as easy as possible to manage.

Do any of you have any tips or thoughts on these processes? Thanks.

Rob Szumski

unread,

Apr 24, 2015, 2:31:05 PM4/24/15

to Jean Mertz, coreo...@googlegroups.com

3) keep cloud-config as simple as possible (already trying hard to do this) and
accumulate changes until you are ready to do another “full sweep” of swapping
all cluster machines (as mentioned in 1.)
This feels like the most natural fit. But I also feel like the first months
will still see a high churn rate on the cloud-config file.

This is the best way to go. Ideally your cloud-config is as simple as it can be. Just the pieces to get the machine booted, on the network and join the cluster.

One thing worth mentioning is that a central etcd cluster should make it less painful to swap machines in and out. You’d identified reasons why this is kind of a pain, but it’s better than having to down the entire cluster.

If you’re running on bare metal (vs a cloud) swapping your user-data file on disk and re-running cloudinit is pretty easy. The issue with doing this on a cloud provider is that you’ve now reconfigured the machine for that boot, but the metadata service isn’t going to provide the same user-data on the next boot. This behavior is different between providers: AWS only allows you to modify user-data when a machine is stopped. DO doesn’t let you modify it at all. GCE does allow you to change it at any time, I believe, but it takes affect on the next boot.

Now, changes to unit files is another concept I haven’t quite gasped yet. If
we find a bug in one of our unit files, or we need to add a new environment
variable to one of our Docker containers, we need to update the unit files on
all machines, destroy them, re-add, restart the affected units and possibly do a
systemctl daemon-reload. This again feels like mental burden which I want to
minimize as much as possible, and make as easy as possible to manage.
Do any of you have any tips or thoughts on these processes? Thanks.

Your units should be pointing to a specific tag of a docker container, so updating a unit file should be the same process as deploying a new version of your app. If you have a CI system that orchestrates the fleet API, this should be pretty painless.

If you’re running this process manually, it might be worth writing a quick script to save you from all the typing.

- Rob

Jean Mertz

unread,

Apr 24, 2015, 3:12:08 PM4/24/15

to coreo...@googlegroups.com, je...@mertz.fm

Op vrijdag 24 april 2015 20:31:05 UTC+2 schreef Rob Szumski:

3) keep cloud-config as simple as possible (already trying hard to do this) and
accumulate changes until you are ready to do another “full sweep” of swapping
all cluster machines (as mentioned in 1.)
This feels like the most natural fit. But I also feel like the first months
will still see a high churn rate on the cloud-config file.
This is the best way to go. Ideally your cloud-config is as simple as it can be. Just the pieces to get the machine booted, on the network and join the cluster.

One thing worth mentioning is that a central etcd cluster should make it less painful to swap machines in and out. You’d identified reasons why this is kind of a pain, but it’s better than having to down the entire cluster.

Could you elaborate on *how* the etcd cluster makes this less painful? Just to clarify our setup:

I believe I am already using etcd for cluster management (that is, I use etcd discovery already). Also, part of the initial cloud-config setup is to bootstrap a working Consul cluster. I use the Consul API to get a list of active machines and dynamically generate the Ansible inventory this way, so there is no hardcoded list of running machines, it should all be fluid.

If you’re running on bare metal (vs a cloud) swapping your user-data file on disk and re-running cloudinit is pretty easy. The issue with doing this on a cloud provider is that you’ve now reconfigured the machine for that boot, but the metadata service isn’t going to provide the same user-data on the next boot. This behavior is different between providers: AWS only allows you to modify user-data when a machine is stopped. DO doesn’t let you modify it at all. GCE does allow you to change it at any time, I believe, but it takes affect on the next boot.

Ah right, I hadn't thought about that problem yet. We are running on a OpenStack provider within country borders (The Netherlands) so I assume we'll have the same issue. It's good to know, but at the same time I think I want to keep away from re-running updated cloud-configs anyway, so this shouldn't affect us.

Now, changes to unit files is another concept I haven’t quite gasped yet. If
we find a bug in one of our unit files, or we need to add a new environment
variable to one of our Docker containers, we need to update the unit files on
all machines, destroy them, re-add, restart the affected units and possibly do a
systemctl daemon-reload. This again feels like mental burden which I want to
minimize as much as possible, and make as easy as possible to manage.
Do any of you have any tips or thoughts on these processes? Thanks.
Your units should be pointing to a specific tag of a docker container, so updating a unit file should be the same process as deploying a new version of your app. If you have a CI system that orchestrates the fleet API, this should be pretty painless.

Using CI to manage the Fleet API sounds like a great plan. Do you have any links to running implementations of this? Would be nice to have some examples to go by while working with this.

If you’re running this process manually, it might be worth writing a quick script to save you from all the typing.

- Rob

Thanks for the great input Rob. Still, if anyone else has any more thoughts, I am all ears.

Rob Szumski

unread,

Apr 24, 2015, 3:32:01 PM4/24/15

to Jean Mertz, coreo...@googlegroups.com

Could you elaborate on *how* the etcd cluster makes this less painful? Just to clarify our setup:

I believe I am already using etcd for cluster management (that is, I use etcd discovery already). Also, part of the initial cloud-config setup is to bootstrap a working Consul cluster. I use the Consul API to get a list of active machines and dynamically generate the Ansible inventory this way, so there is no hardcoded list of running machines, it should all be fluid.

I’m making an assumption that your central etcd machines are dedicated to etcd and maybe a few other cluster coordination tasks. My other assumption is that this means you’re mostly modifying/tweaking/testing cloud-config changes on your worker machines only.

Since you don’t have to worry about maintaining etcd’s quorum, bootstrapping, new discovery token, etc on the worker machines (since they just point at existing the etcd cluster), it’s easy to recreate or reboot those machines as you make changes. For example if you need to add a new disk, you can make that change and test it out on one machine and it will join the fleet cluster after boot and start receiving work. You can then swap out the other machines pretty easily one at a time or as a group depending on your needs. But you skipped all the etcd-related bootstrapping tasks and instead can do a quick loop over `nova delete` and `nova boot`.

- Rob

Mertz, Jean

unread,

Apr 24, 2015, 4:16:32 PM4/24/15

to Rob Szumski, coreo...@googlegroups.com

On Fri, Apr 24, 2015 at 9:31 PM, Rob Szumski <rob.s...@coreos.com> wrote:

Could you elaborate on *how* the etcd cluster makes this less painful? Just to clarify our setup:

I believe I am already using etcd for cluster management (that is, I use etcd discovery already). Also, part of the initial cloud-config setup is to bootstrap a working Consul cluster. I use the Consul API to get a list of active machines and dynamically generate the Ansible inventory this way, so there is no hardcoded list of running machines, it should all be fluid.

I’m making an assumption that your central etcd machines are dedicated to etcd and maybe a few other cluster coordination tasks. My other assumption is that this means you’re mostly modifying/tweaking/testing cloud-config changes on your worker machines only.

Interesting that you are making that assumption. I haven't really set up any cluster to manage Etcd. Right now I simply use the coreos-provided discovery.etcd.io for key-retrieval and discovery. Of course, this is something we will want to pull into our own network in the future, so I guess that is what you are talking about.

The concept of "worker machines" and "cluster management" machines is not something I had considered before. Conceptually, I guess this would entail having Fleet metadata to tag "worker" machines and "management" machines and have all unit files define that they only want to live on worker machines. The next step would be to have some unit files for the management machines that boots and manages the etcd registry/management part.

Still though, having 3 - 5 separate machines to manage the Etcd cluster/quorum does feel a bit like wasted resources. As long as we make sure to always keep at least three machines online during a rebuild-cycle, are there any downsides to having your Etcd cluster running inside Docker containers distributed over the same cluster it is managing?

Since you don’t have to worry about maintaining etcd’s quorum, bootstrapping, new discovery token, etc on the worker machines (since they just point at existing the etcd cluster), it’s easy to recreate or reboot those machines as you make changes.

Indeed, my plan was to make sure that at least three machines are always online to keep the Etcd quorum going, and rebuild-cycle all other machines if any important configurations change. Separating the cluster-management and the workers makes it possible to simplify the mental burden of a rebuild-cycle, but of course we'll have to keep enough machines online during an update cycle to have enough resources for the running services.

Rob Szumski

unread,

Apr 24, 2015, 4:41:53 PM4/24/15

to Mertz, Jean, coreo...@googlegroups.com

On Apr 24, 2015, at 1:16 PM, Mertz, Jean <je...@mertz.fm> wrote:

The concept of "worker machines" and "cluster management" machines is not something I had considered before. Conceptually, I guess this would entail having Fleet metadata to tag "worker" machines and "management" machines and have all unit files define that they only want to live on worker machines.

I’m basically referencing this: https://coreos.com/docs/cluster-management/setup/cluster-architectures/#production-cluster-with-central-services

Rimas Mocevicius

unread,

Apr 24, 2015, 7:24:39 PM4/24/15

to coreo...@googlegroups.com, rob.s...@coreos.com

Jean,

Dedicating 3 small machines just to run etcd cluster, it is not really wasted resources.

It gives peace of mind running it separately from workers, as you can then scale your worker machines, take them down/up as

much as you like.

Treat your etcd cluster as database cluster, usually database servers do not run anything else except database services,

so etcd is the same.

I did that mistake last summer setting up lots of small clusters when I moved development and production servers to CoreOS,

and paid a price of wasting my time maintaining these clusters.

Setting up 3x etcd cluster + 20 workers took all those problems away, I can now concentrate

on building/releasing docker containers but not looking after faulty clusters.

As Rob referenced https://coreos.com/docs/cluster-management/setup/cluster-architectures/#production-cluster-with-central-services

that's the way to go to setup proper CoreOS clusters.

Rimas

Mertz, Jean

unread,

Apr 25, 2015, 10:05:53 AM4/25/15

to Rimas Mocevicius, coreo...@googlegroups.com, rob.s...@coreos.com

Thank you for the advice Rimas.

Your tale of caution makes perfect sense and I agree that the required resources are minimal, and (apparently) worth the investment.

I've indeed started to work on the bootstrapping of the Etcd cluster since Rob linked to the architecture document. I somehow missed that document, having spent hours and hours on the CoreOS website in the past couple of months.

Having gone through that document, I have two more related questions:

* The document doesn't mention Etcd's proxy capabilities, is this simply because it wasn't available at the time of writing? It seems like this would be the way to go, so workers can still query the local Etcd instance, and simply be proxied to one of the Etcd servers.

* How does the discovery mechanism fit into this picture? In the document, the Etcd servers are still using the remote discovery.etcd.io endpoint for initial discovery but they also use static IPs, so I am unsure if there is still a need for this? However, I guess since discovery only happens on the Etcd cluster side of things (instead of all the workers), the problems one would face if that endpoint becomes unresponsive is relatively small, since workers can still be added to the cluster, correct?

--
You received this message because you are subscribed to a topic in the Google Groups "CoreOS User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/coreos-user/iE_qRnQETTc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rimas Mocevicius

unread,

Apr 25, 2015, 10:30:04 AM4/25/15

to coreo...@googlegroups.com, rob.s...@coreos.com, rmo...@gmail.com

Jean,

Not all documents on http://coreos.com are updated to the latest etcd 2.0 features yet.

Proxy only works with etcd 2.0

Check this doc https://github.com/coreos/etcd/blob/master/Documentation/clustering.md how to bootstrap etcd 2.0 clusters

in different ways.

Rob,

This one https://coreos.com/docs/cluster-management/setup/cluster-architectures/#production-cluster-with-central-services

should be more visible on the site, from stopping guys making wrong cluster setups

Rob Szumski

unread,

Apr 25, 2015, 2:33:56 PM4/25/15

to Mertz, Jean, Rimas Mocevicius, coreo...@googlegroups.com

* The document doesn't mention Etcd's proxy capabilities, is this simply because it wasn't available at the time of writing? It seems like this would be the way to go, so workers can still query the local Etcd instance, and simply be proxied to one of the Etcd servers.

Yes, this document is currently targeting 0.4.x but the intention is to update it to use the proxy set up once 2.0 is available in all channels. As you can see, the proxy is perfect for this scenario, and that’s one of the reasons the feature exists.

* How does the discovery mechanism fit into this picture? In the document, the Etcd servers are still using the remote discovery.etcd.io endpoint for initial discovery but they also use static IPs, so I am unsure if there is still a need for this? However, I guess since discovery only happens on the Etcd cluster side of things (instead of all the workers), the problems one would face if that endpoint becomes unresponsive is relatively small, since workers can still be added to the cluster, correct?

This document is showing more of a baremetal setup so thats why you see the network units and such. The method of etcd bootstrapping isn’t important in this type of setup, but we had to show one and picked static bootstrapping.

The discovery service is only used in the initial boot to find peers, afterwards its cached on disk. If you are worried about this, static bootstrapping or running your own discovery service are possible mitigations.

Adding and removing workers in this setup is extremely easy because you just need to point them at the etcd cluster. Perfect for autoscaling, etc.

- Rob

Jean Mertz

unread,

May 12, 2015, 3:55:38 AM5/12/15

to coreo...@googlegroups.com, je...@mertz.fm, rmo...@gmail.com

Thank you for all the replies. It really helped me get a better understanding of

a sane CoreOS setup.

I have two more questions regarding my first post:

> After that, I have setup a very basic Ansible configuration to synchronise all

> the systemd unit files across the hosts

I have a hard time wrapping my head around the workflow involved in this use

case. For example:

I have a cluster running "App X". App X runs in a Docker container, which in

turn is handled by a systemd unit file. App X is due for an update. Since the

unit file points to a specific Docker image `jeanmertz/appx:1.0.4`, I need to

update this unit to point to a newer version.

Once this is done, I can access the Fleet API from my local machine, use

`destroy` and then use `submit` while pointing to the new file on my machine to

load that file into Fleet.

This works, but it also creates the problem where it's no longer a requirement

to have a changed unit file committed to a Git repo (and optionally verified by

a CI solution).

How would you handle this use-case?

Furthermore, I have noticed that - if your unit files have dependencies on other

unit files - you need to either submit all unit files into Fleet, even if they

are "secondairy" unit files (f.e. data container or sidekick), or have a

provisioning tool (like Ansible) place all unit files on all machines, whenever

a unit file changes. So that when Fleet starts unit file A on machine Z, and

that unit has a dependency on unit file B, machine Z already has this unit file

in place.

It seems impractical to recreate all worker machines whenever a unit file

changes, which would be the case if I managed all unit files using cloud-config.

But I also would like to see if there is a workable solution *without* any

Ansible integration, as it creates another infrastructure management layer,

which I want to avoid whenever possible.

Any thoughts on this?

Or, is the answer a simple: don't use Fleet for app-level infrastructure, but

instead use it to run something like Kubernetes/Swarm/Deis/...?

Op zaterdag 25 april 2015 20:33:56 UTC+2 schreef Rob Szumski:

Rimas Mocevicius

unread,

May 17, 2015, 5:59:18 AM5/17/15

to coreo...@googlegroups.com

Jean,

Regarding question 1:

I usually do not destroy fleet units when I want to release new docker image.

My fleet units always point to the image:latest tag.

When a new docker image (tested and is ready for production) is released to registry it gets incremantal version number

and that version gets assigned latest tag.

So my deployment script:

1) pulls latest image on to all servers

2) stop/start fleet unit one by one on each server

3) that allows to release new version without down time

Question 2:

My current worker machines are set in such way:

1) via cloud-config the minimal systemd files set gets deployed to servers,

like network, attached storage, connection to etcd cluster, flannel, docker

2) the rest is done via fleet units, as they are much easier to get updated and you

do not have recreate worker machines if you have some changes to fleet units

Now going a bit further up level of cluster setup:

1) minimal systemd units via cloud-config setup as above

2) use fleet to bootstrap Kubernetes / http://www.paz.sh

3) use Kubernetes/Paz for Application Containers orchestration/management and etc

I hope my answers gave you more insight on CoreOS clusters setup

Regards

Rimas

Arthur Clément

unread,

May 17, 2015, 7:29:35 AM5/17/15

to coreo...@googlegroups.com

Hi,

Thanks for this very interesting thread, I am also in this kind of brainstorm stage about CoreOS cluster management and CI. As Rimas, I think the best option is to only update your Docker images, deploy new containers through fleet with the latest image and stop the old containers. And if you want a rollback you just have to change the tag in your docker registry and re-run fleet like fleetctl start myservice@`$date-$version`.service

I'm far from a production ready cluster but I manage/share my unit files with git at the moment.

Paz seems a promising project indeed, I think it's the best way to orchestrate as much as possible with fleet and etcd. Jean, if you have trouble to elaborate a solution to manage your workflow through fleet/etcd, you might keep digging to simplify it. It might not be a good idea to use CoreOS without embracing the CoreOS philosophy. I enjoy this project because you have to rethink a lot about infrastructure management.

Regards,

Arthur

Rimas Mocevicius

unread,

May 17, 2015, 7:45:34 AM5/17/15

to coreo...@googlegroups.com

Arthur,

You are right,

moving my development and production clusters to CoreOS last summer I had to rethink a lot about infrastructure management.

Running everything on CoreOS for the last 10 months I have learned many new things

about CoreOS clusters in a hard way.

I see fleet as a low level orchestration/bootstrapping service where you have to do all hard work as write all your

deployment scripts from a scratch.

This is where Kubernetes and Paz.sh come to help you out.

Kubernetes is ready to be used in production already (I know there is GA yet, but still)

Paz.sh still needs some work to be done, guys are working hard there to make it happen too.

In couple weeks I when have more spare time (finishing my book on CoreOS),

I will start moving some of my projects to Kubernetes and then I will blog about my success there.

Rimas

Arthur Clément

unread,

May 17, 2015, 8:03:42 AM5/17/15

to coreo...@googlegroups.com

I tried Kubernetes but it's a big abstraction. I want to learn more about CoreOS and work closer from the "core". I see CoreOS and fleet as new toys, I want to play with them !

I will be glad to help if you need proofreaders for your book.

Arthur

Rimas Mocevicius

unread,

May 17, 2015, 8:15:35 AM5/17/15

to coreo...@googlegroups.com

Arthur,

CoreOS was for me a toy a good year ago :)

It matured a lot since then and it is really good OS to run Application Containers on the cluster level.

To use fleet to run/manage containers is fine if you have a small cluster and not too many

containers you want to run, but on the bigger scale it is too painful.

You have to make your own scripts to monitor/manage/deploy your fleet units.

If Kubernetes is not your thing, then try paz.sh as it is fully based on fleet.

Regarding the book it is a commercial book, publisher chose themselves the technical reviewers.

Rimas

Jean Mertz

unread,

May 21, 2015, 11:28:22 AM5/21/15

to coreo...@googlegroups.com

Thank you for all the interesting responses. I've since configured a Kubernetes

cluster on top of CoreOS, and can already see a lot of benefits. Management of

pods/containers is becoming a lot easier and container upgrades feel like a

natural fit now.

It was interesting to read all of your use-cases, looking forward to doing a lot