Greetings,
We have got a 3 node RabbitMQ cluster running in our self hosted Kubernetes cluster, The cluster is deployed using bitnami/rabbitmq helm chart (v8.24.2), and we have about 6 moderately busy queues on it, they are quorum queues with similar settings and data persistence.
One of the minions in which we host the RabbitMQ got restarted a couple of days ago, resulting in non availability of one of the cluster member pods for about 10 minutes, this has caused a weird behavior as explained below.
1. One of the queue did not continue to work when one rabbitmq pod was not available, but the others worked fine, I see there are a plethora of error log entries with [erro] <<"queue_name">>} message in the cluster for the failed queue as seen below.
----
Jun 28, 2022 @ 11:33:46.739 [erro] <0.11448.575> <<"ace_attributes_v2">>}
Jun 28, 2022 @ 11:33:46.741 [erro] <0.11448.575> <<"face_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.742 [erro] <0.11448.575> <<"ace_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.766 [erro] <0.11448.575> <<"ace_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.768 [erro] <0.11448.575> <<"face_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.789 [erro] <0.11448.575> <<"face_attributes_v2">>}},
Jun 28, 2022 @ 11:34:19.301 [info] <0.684.0> queue 'face_attributes_v2' in vhost '/': detected a new leader {'%2Fface_attributes_v2','rab...@rabbitmq-1.rabbitmq-headless.ml.svc.cluster.local'} in term 10
---
2. The old messages (TTL: 10s) not discarded when the consumer was working again. I see there are too many ttl related errors as well during incident time (when one of the cluster member was not available)
----
Jun 28, 2022 @ 11:33:46.632 [erro] <0.3303.575> {<<"x-message-ttl">>,signedint,10000}],
Jun 28, 2022 @ 11:33:46.533 [erro] <0.10979.574> {<<"x-message-ttl">>,signedint,10000}],
----
Everything went back to normal when the failed cluster pod rejoined, but this has created a brief outage :-(
Wondering if someone here have gone through similar problems before, and I would appreciate if someone can share their thoughts on this! Thank you!