Indeed, bringing back a single etcd from a binary backup works like a charm. Growing the cluster again is tricky but possible.
For anyone else who is interested, I have explored both options now and they have different trade-offs.
In any case, I assume and would strongly advise that if you have or suspect a problem with etcd, shut down all kube-apiserver. For the most part, all Kubernetes components will simply carry on with the data they have (Q: how does the new kubedns react to API unavailability?). This comes from experiences with our previous cluster system. If you're not sure about data integrity, make very sure that the cluster will not attempt to do anything. Shutting down the kube-controller-manager and kube-scheduler is probably advisable too, just to keep the cluster from changing by itself when the API is back.
For etcdctl backup, the restore is seamless as far as Kubernetes components are concerned (they only see an extended apiserver outage and do not know that the etcd cluster was changed underneath). The sequence to grow a single-node etcd cluster to a full cluster again is:
1. start the first etcd from the backup using `--force-new-cluster`
2. restart it using normal configuration (optional) – at this point the new cluster membership is in the data and flags don't change it
3. `etcdctl member update` to set the correct peer URL (force-new-cluster _always_ sets it to localhost-only)
4. `etcdctl member add` the next node
- when going from 1 to 2 members, this breaks quorum; the cluster becomes unusable and this cannot be undone other than starting at 1. again
5. start the next node with the correct `initial-cluster` settings from 4 – the second node will have 2 nodes in the initial peer list, the third node will have 3 etc. You cannot shortcut this or it will be unable to join.
6. Until enough nodes are added, goto 4.
If you are using TLS certificates, regenerate them with all the new IPs / domain names before starting.
Using the (logical) etcd-backup method, start a completely new replacement cluster (initial-cluster peer list with all nodes, initial-cluster-state=new, or use etcd discovery), and replay the backup using this tool[0]. Start the apiservers, then restart every API client that uses watches (that is pretty much all of them). This method invalidates all revision numbers, and API clients will wait for changes a long way in the future.
So to recap – with the (canonical) binary data backups, the restore is cleaner for Kubernetes but harder to pull off correctly in etcd. With etcd-backup, it is easier in etcd but requires restarting everything.