Correct way to restart Kubernetes RabbitMQ cluster?

1,942 views
Skip to first unread message

Arun Nair

unread,
Jul 4, 2022, 5:42:32 AM7/4/22
to rabbitmq-users
Recently one of the metrics endpoint of my RabbitMQ cluster went down (https://github.com/rabbitmq/cluster-operator/issues/1082). To try and solve it I deleted & re-created the cluster (using `kubectl delete -f filename.yaml`). The error was resolved, but I lost all the existing messages.

Is there any alternate way to restart the cluster which won't make me lose all my existing messages? (for example, like rollout restart for 'Deployment' kind)

My cluster yaml file:

```
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
    metadata:
         name: rabbitmq-cluster
spec:
     replicas: 1
     rabbitmq:
         additionalConfig: |
              log.console.level = info
    service:
        type: LoadBalancer
```

Michal Kuratczyk

unread,
Jul 4, 2022, 5:49:37 AM7/4/22
to rabbitm...@googlegroups.com
Hi,

RabbitMQ Cluster is a StatefulSet deployment and `kubectl rollout restart statefulset` is the correct way to perform a rolling restart.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2a980ebd-e319-485f-8800-143d8a53f387n%40googlegroups.com.


--
Michał
RabbitMQ team

Arun Nair

unread,
Jul 4, 2022, 7:59:07 AM7/4/22
to rabbitmq-users
I tried this but lost all the messages again :(
I ran `kubectl rollout restart statefulset name-of-cluster`
Here's some screenshots of the Rabbitmq-Overview Grafana dashboard:

Screenshot 2022-07-04 at 5.19.42 PM.png
Screenshot 2022-07-04 at 5.24.40 PM.png

Michal Kuratczyk

unread,
Jul 4, 2022, 8:27:56 AM7/4/22
to rabbitm...@googlegroups.com
Hi,

I guess these were messages published as transient. If that's the problem, publish messages as persistent and/or use quorum queues.

Best,



--
Michał
RabbitMQ team

Arun Nair

unread,
Jul 5, 2022, 4:25:22 AM7/5/22
to rabbitmq-users
Thanks! Making queues durable, and messages persistent, has solved the issue.

Just one more question - even though I retained all my 'Ready' messages, it seems 'Unacknowledged' ones were still lost during the restart. Is that intended behavior? Or is there some way to persist those messages too?
If it's normal behavior, I think I should set my prefetch limit to 1? Currently it's set to 50 but the consumers that we have can take about half a minute or more to process the message (ACK is sent asap on getting the message).

Michal Kuratczyk

unread,
Jul 5, 2022, 7:29:20 AM7/5/22
to rabbitm...@googlegroups.com
Hi,

When you ACK a message, you basically tell RabbitMQ that it can delete it - you are done with it and you don't need it anymore.
If you decide to ACK immediately after consuming a message, despite the processing taking half a minute, it's on you to provide
some failure handling - you already told RabbitMQ that it can forget this message. So either you need to ACK after you completed
processing or you need to have some other recovery mechanism for tasks that were ACKed in RabbitMQ but not completed.

Best,



--
Michał
RabbitMQ team

Arun Nair

unread,
Jul 5, 2022, 8:42:46 AM7/5/22
to rabbitmq-users
I meant the unacknowledged ones. I understand acknowledged ones getting lost in case of any interruption
> it seems 'Unacknowledged' ones were still lost during the restart

Michal Kuratczyk

unread,
Jul 5, 2022, 8:49:52 AM7/5/22
to rabbitm...@googlegroups.com
Hi,

The metrics are updated periodically - you might have seen N messages unacked when you stopped RabbitMQ while some of these messages might have already been acked, just not accounted for in the stats yet.
Remaining messages (those actually unacked) are no longer unacked after the restart - they go back to Ready until delivered to a "new" consumer (after the restart). So you will see 0 unacked, but it doesn't mean
you lost messages.

Losing unacked messages would be a major bug and is unlikely to happen, but if you can reproduce that - let us know.

Best,



--
Michał
RabbitMQ team

Arun Nair

unread,
Jul 5, 2022, 9:58:03 AM7/5/22
to rabbitmq-users
Okay. I tested this using RabbitMQ Management UI and everything seems to be in order!

The unacknowledged count was going to 0 because during cluster restart, my consumers too were restarting (due to error caused by lost connection). The unacknowledged messages were sent back into the queue like it's mentioned here. I might have missed that on Grafana, since it shows values in 'K's when over 1000. When I checked the Management UI, there were 15,144 ready messages and 150 unacknowledged. When the consumers stopped, there were 15,294 ready messages and 0 unacknowledged.

Thanks for the help
Reply all
Reply to author
Forward
0 new messages