Backing up etcd

1,307 views
Skip to first unread message

Matthias Rampke

unread,
Jul 26, 2016, 3:39:50 PM7/26/16
to Containers at Google
One more "how are people doing this" …

I'm working on a recovery strategy for the etcd cluster powering Kubernetes. For this, I'd like to back up etcd, but the restore procedure turns out to be quite hairy.

The obvious way is using `etcdctl backup` and `--force-new-cluster`[0], however, I'm struggling with getting from the single recovered node to a full cluster again. It seems to require a very precise sequence of `etcdctl member add` and bringing up that new member with the exact right initial cluster state. This is complicated by TLS certificates that need to be generated for the correct hostnames / IP addresses.

Another strategy I am exploring is using the old etcd-backup utility[1] to dump the contents of etcd, but that loses all information about revision numbers for the various keys. As I understand it, Kubernetes uses these as the revision for API listeners as well, so I'm not sure how the various cluster components will react to etcd coming back with all revision numbers reset? (I haven't tried this yet, I can report back when I have). So worst case this would mean I have to stop/restart all Kubernetes API consumers, including kubelet, DNS and external consumers?

Questions:
* do I understand the revision problem correctly?
* is there a smoother way to expand a one-node etcd cluster?
* how are you backing up / restoring etcd?
* are you backing up / restoring etcd? If not, how do you recover from a cluster that has irrevocably lost quorum?

Thank you,
MR



--
Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49 173 6395215

Managing Director: Alexander Ljung | Incorporated in England & Wales with Company No. 6343600 | Local Branch Office | AG Charlottenburg  | HRB 110657B

Rodrigo Campos

unread,
Jul 26, 2016, 11:59:50 PM7/26/16
to google-c...@googlegroups.com

On Tuesday, July 26, 2016, Matthias Rampke <m...@soundcloud.com> wrote:
One more "how are people doing this" …

I'm working on a recovery strategy for the etcd cluster powering Kubernetes. For this, I'd like to back up etcd, but the restore procedure turns out to be quite hairy.

As nobody else answered, here I go.  It depends on what kind of disaster recovery you are working. In my case (at least with the features we are using now), it's trivial to create a new k8s cluster and just launch all the pods. Just re-create everything, no etcd backup for that.

I'm sure you considered this and it'snot good enough, but just in case I thought I'd answer :)




Thanks,
Rodrigo

Alex Robinson

unread,
Aug 1, 2016, 1:42:47 PM8/1/16
to Containers at Google
I don't have any experience with backups/restores of multi-node etcd clusters, but can report that single-node backup/restore works well if done with a simple approach like this:

* Periodically make a backup using the normal `etcdctcl backup` command
* When you need to restore, stop the apiserver and etcd from running
* Move the backed up data into place, then start etcd with --force-new-cluster pointed at the backed up data
* Once the new etcd has initialized, kill it and replace it with your normal etcd configuration that doesn't have --force-new-cluster

Sorry I don't have better advice to give around multi-node etcd. If you get things figured out or have an interesting experience, I'd love to hear about it!

Alex

Matthias Rampke

unread,
Aug 2, 2016, 6:12:21 AM8/2/16
to google-c...@googlegroups.com
Hi,


On Mon, Aug 1, 2016 at 5:42 PM 'Alex Robinson' via Containers at Google <google-c...@googlegroups.com> wrote:

Sorry I don't have better advice to give around multi-node etcd. If you get things figured out or have an interesting experience, I'd love to hear about it!

Indeed, bringing back a single etcd from a binary backup works like a charm. Growing the cluster again is tricky but possible.

For anyone else who is interested, I have explored both options now and they have different trade-offs.

In any case, I assume and would strongly advise that if you have or suspect a problem with etcd, shut down all kube-apiserver. For the most part, all Kubernetes components will simply carry on with the data they have (Q: how does the new kubedns react to API unavailability?). This comes from experiences with our previous cluster system. If you're not sure about data integrity, make very sure that the cluster will not attempt to do anything. Shutting down the kube-controller-manager and kube-scheduler is probably advisable too, just to keep the cluster from changing by itself when the API is back.

For etcdctl backup, the restore is seamless as far as Kubernetes components are concerned (they only see an extended apiserver outage and do not know that the etcd cluster was changed underneath). The sequence to grow a single-node etcd cluster to a full cluster again is:

1. start the first etcd from the backup using `--force-new-cluster`
2. restart it using normal configuration (optional) – at this point the new cluster membership is in the data and flags don't change it
3. `etcdctl member update` to set the correct peer URL (force-new-cluster _always_ sets it to localhost-only)
4. `etcdctl member add` the next node
- when going from 1 to 2 members, this breaks quorum; the cluster becomes unusable and this cannot be undone other than starting at 1. again
5. start the next node with the correct `initial-cluster` settings from 4 – the second node will have 2 nodes in the initial peer list, the third node will have 3 etc. You cannot shortcut this or it will be unable to join.
6. Until enough nodes are added, goto 4.

If you are using TLS certificates, regenerate them with all the new IPs / domain names before starting.


Using the (logical) etcd-backup method, start a completely new replacement cluster (initial-cluster peer list with all nodes, initial-cluster-state=new, or use etcd discovery), and replay the backup using this tool[0]. Start the apiservers, then restart every API client that uses watches (that is pretty much all of them). This method invalidates all revision numbers, and API clients will wait for changes a long way in the future.


So to recap – with the (canonical) binary data backups, the restore is cleaner for Kubernetes but harder to pull off correctly in etcd. With etcd-backup, it is easier in etcd but requires restarting everything.
 

On Tuesday, July 26, 2016 at 8:59:50 PM UTC-7, Rodrigo Campos wrote:

it's trivial to create a new k8s cluster and just launch all the pods. Just re-create everything, no etcd backup for that.

I'm sure you considered this and it'snot good enough, but just in case I thought I'd answer :)


Indeed, that won't quite cut it for us :) There would be a few hundred services to redeploy, and the time to do that would be unacceptable.

 /MR




Reply all
Reply to author
Forward
0 new messages