One of the quorum queue did not work when a rabbitmq cluster pod was unavailable.

101 views

Skip to first unread message

Prasad Kris

unread,

Jul 1, 2022, 5:55:29 AM7/1/22

to rabbitmq-users

Greetings,

We have got a 3 node RabbitMQ cluster running in our self hosted Kubernetes cluster, The cluster is deployed using bitnami/rabbitmq helm chart (v8.24.2), and we have about 6 moderately busy queues on it, they are quorum queues with similar settings and data persistence.

One of the minions in which we host the RabbitMQ got restarted a couple of days ago, resulting in non availability of one of the cluster member pods for about 10 minutes, this has caused a weird behavior as explained below.

1. One of the queue did not continue to work when one rabbitmq pod was not available, but the others worked fine, I see there are a plethora of error log entries with [erro] <<"queue_name">>} message in the cluster for the failed queue as seen below.

----

Jun 28, 2022 @ 11:33:46.739 [erro] <0.11448.575> <<"ace_attributes_v2">>}
Jun 28, 2022 @ 11:33:46.741 [erro] <0.11448.575> <<"face_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.742 [erro] <0.11448.575> <<"ace_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.766 [erro] <0.11448.575> <<"ace_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.768 [erro] <0.11448.575> <<"face_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.789 [erro] <0.11448.575> <<"face_attributes_v2">>}},
Jun 28, 2022 @ 11:34:19.301 [info] <0.684.0> queue 'face_attributes_v2' in vhost '/': detected a new leader {'%2Fface_attributes_v2','rab...@rabbitmq-1.rabbitmq-headless.ml.svc.cluster.local'} in term 10

---

2. The old messages (TTL: 10s) not discarded when the consumer was working again. I see there are too many ttl related errors as well during incident time (when one of the cluster member was not available)

----

Jun 28, 2022 @ 11:33:46.632 [erro] <0.3303.575> {<<"x-message-ttl">>,signedint,10000}],

Jun 28, 2022 @ 11:33:46.533 [erro] <0.10979.574> {<<"x-message-ttl">>,signedint,10000}],

----

Everything went back to normal when the failed cluster pod rejoined, but this has created a brief outage :-(

Wondering if someone here have gone through similar problems before, and I would appreciate if someone can share their thoughts on this! Thank you!

kjnilsson

unread,

Jul 12, 2022, 5:05:07 AM7/12/22

to rabbitmq-users

It would be helpful if you could share more of the log, unfortunately the default log settings log multi-line logs as separate log entries and we really need them all to know what the actually error is.