Hi!
We're using RabbitMQ as a stateful set inside of OpenShift and we're using the Kubernetes peer discovery. Setup is like this:
## Timeout used when waiting for Mnesia tables in a cluster to become available.
mnesia_table_loading_retry_timeout=18000
## Retries when waiting for Mnesia tables in the cluster startup.
mnesia_table_loading_retry_limit=10
## Clustering
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.testenv.svc.cluster.local
cluster_formation.k8s.token_path = /var/run/secrets/kubernetes.io/serviceaccount/token
cluster_formation.k8s.cert_path = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
cluster_formation.k8s.namespace_path = /var/run/secrets/kubernetes.io/serviceaccount/namespace
cluster_formation.k8s.service_name = rabbitmq-internal
cluster_formation.node_cleanup.interval = 500
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = pause_minority
This works when the cluster is initially started up and forming. All peers store their Mnesia DB on a PVC mounted into the container so it survives restarts. Now we scale the whole set down to 0, for example because we would like to change a configuration of RabbitMQ. After this we scale up to lets say 3 instances.
Now from the documentation about cluster formation I read that if a node is started and finds out that it is a cluster node it will try to contact its peers. Since we scaled the set to 0 before there will be no other nodes active when the first node starts. So the first node starts and outputs messages like this:
Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
So it seems to wait for one of the peers to come online so it can synchronize its tables (at least this is what i gathered from the documentation). It does so 10 times with a timeout of 18000ms (as configured) and eventually gives up and shuts down. The problem is that in a stateful set the second instance is only started once the first instance is ready. So now I am in a state where I cannot start the first instance because it waits for a peer which will only start up once the first instance is ready which waits for a peer. How do I resolve this catch-22-situation? So far we helped ourselves by deleting the Mnesia DB in the PVC and resetting the cluster but that is not really a great solution as we lose messages this way.
Any help would be greatly appreciated. Please let me know if you need additional information.
Thank you and kind regards,
Jan