Using bootkube recover to restore from etcd snapshot

33 views
Skip to first unread message

David Taylor

unread,
Jan 16, 2018, 11:06:19 AM1/16/18
to CoreOS User
Hello,

I am deploying Tectonic on bare-metal, and we are using our second cluster to run through disaster recovery scenarios. I have opened an issue on the bootkube GitHub repo (https://github.com/kubernetes-incubator/bootkube/issues/834), and I am hoping someone here can provide some suggestions. 

Long story short (full details in the GitHub issue), I am periodically taking snapshots from a remote machine using etcdctl snapshot save, and storing them in S3. I simulated a total failure by blowing away every single master node and rebuilding with terraform. As it stands there is nothing running on the controllers besides etcd. I have destroyed everything else (pods, containers, etc). After following the instructions in https://coreos.com/tectonic/docs/latest/troubleshooting/bootkube_recovery_tool.html, I attempt the bootkube recover command but receive the following error:


Perhaps I am going about this totally wrong, but I would like to be able to completely blow away the cluster, rebuild with terraform, then restore from an etcd backup.

I find it strange that those keys don't even exist in our operational cluster. Am I taking backups incorrectly, or is there some other command I need to run to get a functional backup? I am at a total loss here. If I can provide any other helpful details let me know, any help would be greatly appreciated.

David Taylor

unread,
Jan 16, 2018, 3:30:01 PM1/16/18
to CoreOS User
When deploying with Tectonic installer and terraform, self-hosted etcd is running on rkt directly on the controller node. In the terminology of bootkube, self-hosted etcd is running inside Kubernetes rather then on the node itself. If I treat etcd running on the controllers as an external cluster and use the appropriate flag in bootkube, this error goes away.
Reply all
Reply to author
Forward
0 new messages