Recovering from Kubernetes node failure running Cassandra

55 views
Skip to first unread message

ka...@szczygiel.io

unread,
Feb 8, 2018, 4:52:24 PM2/8/18
to Kubernetes user discussion and Q&A
I'm looking for a good solutions to replace dead Kubernetes worker node that was running Cassandra in Kubernetes.

Scenario:

Cassandra cluster built from 3 pods
Failure occurs on one of the Kubernetes worker nodes
Replacement node is joining the cluster
New pod from StatefulSet is scheduled on new node
As pod IP address has changed, new pod is visible as new Cassandra node (4 nodes in cluster in total) and is unable to bootstrap until dead one is removed.
It's very difficult to follow the official procedure (https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html), as Cassandra is ran as StatefulSet.

One completely hacky workaround I've found is to use ConfigMap to supply JAVA_OPTS. As changing ConfigMap doesn't recreate pods (yet), you can manipulate running pods in such way that you will be able to follow the procedure.

However, that's as I mentioned, super hacky. I'm wondering if anyone is running Cassandra on top of Kubernetes and has a better idea how to deal with such failure?

K.

Sébastien Allamand

unread,
Feb 9, 2018, 3:32:53 AM2/9/18
to Kubernetes user discussion and Q&A
We have same questions and not found solution for now. there is also this github thread on same subject : https://github.com/kubernetes/kubernetes/issues/28969
Reply all
Reply to author
Forward
0 new messages