How to do a full cluster restart with a stateful set?

195 views
Skip to first unread message

Jan Thomä

unread,
Feb 12, 2020, 6:04:51 AM2/12/20
to rabbitmq-users
Hi!

We're using RabbitMQ as a stateful set inside of OpenShift and we're using the Kubernetes peer discovery. Setup is like this:

    ## Timeout used when waiting for Mnesia tables in a cluster to become available.
    mnesia_table_loading_retry_timeout
=18000
   
## Retries when waiting for Mnesia tables in the cluster startup.
    mnesia_table_loading_retry_limit
=10

   
## Clustering
    cluster_formation
.peer_discovery_backend = rabbit_peer_discovery_k8s
    cluster_formation
.k8s.host = kubernetes.default.svc.cluster.local
    cluster_formation
.k8s.address_type = hostname
    cluster_formation
.k8s.hostname_suffix = .rabbitmq-internal.testenv.svc.cluster.local
    cluster_formation
.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
    cluster_formation
.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    cluster_formation
.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace
    cluster_formation
.k8s.service_name = rabbitmq-internal
    cluster_formation
.node_cleanup.interval = 500
    cluster_formation
.node_cleanup.only_log_warning = true
    cluster_partition_handling
= pause_minority

This works when the cluster is initially started up and forming. All peers store their Mnesia DB on a PVC mounted into the container so it survives restarts. Now we scale the whole set down to 0, for example because we would like to change a configuration of RabbitMQ. After this we scale up to lets say 3 instances.

Now from the documentation about cluster formation I read that if a node is started and finds out that it is a cluster node it will try to contact its peers. Since we scaled the set to 0 before there will be no other nodes active when the first node starts. So the first node starts and outputs messages like this:

Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}

So it seems to wait for one of the peers to come online so it can synchronize its tables (at least this is what i gathered from the documentation). It does so 10 times with a timeout of 18000ms (as configured) and eventually gives up and shuts down. The problem is that in a stateful set the second instance is only started once the first instance is ready. So now I am in a state where I cannot start the first instance because it waits for a peer which will only start up once the first instance is ready which waits for a peer. How do I resolve this catch-22-situation? So far we helped ourselves by deleting the Mnesia DB in the PVC and resetting the cluster but that is not really a great solution as we lose messages this way.

Any help would be greatly appreciated. Please let me know if you need additional information.

Thank you and kind regards,
Jan

Michael Klishin

unread,
Feb 28, 2020, 7:03:33 PM2/28/20
to rabbitmq-users
There is a dedicated doc section [1] on node restarts.

All nodes must start within a certain time window. How to best achieve that on Kubernetes is your call. Scaling an existing cluster to zero nodes
is not really a scenario we consider to be commonly used. In fact, our Kubernetes team have found many interesting edge cases
specific to Kubernetes deployment process but this is the first time I see someone approach this problem like this.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/51e5e677-fbcf-4a22-8921-fe420f9b8df6%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages