One of the quorum queue did not work when a rabbitmq cluster pod was unavailable.

101 views
Skip to first unread message

Prasad Kris

unread,
Jul 1, 2022, 5:55:29 AM7/1/22
to rabbitmq-users
Greetings,

We have got a 3 node RabbitMQ cluster running in our self hosted Kubernetes cluster, The cluster is deployed using bitnami/rabbitmq helm chart (v8.24.2), and we have about 6 moderately busy queues on it, they are quorum queues with similar settings and data persistence.

One of the minions in which we host the RabbitMQ got restarted a couple of days ago, resulting in non availability of one of the cluster member pods for about 10 minutes, this has caused a weird behavior as explained below.

1. One of the queue did not continue to work when one rabbitmq pod was not available, but the others worked fine, I see there are a plethora of error log entries with [erro] <<"queue_name">>} message in the cluster for the failed queue as seen below.

----
Jun 28, 2022 @ 11:33:46.739 [erro] <0.11448.575> <<"ace_attributes_v2">>}
Jun 28, 2022 @ 11:33:46.741  [erro] <0.11448.575> <<"face_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.742 [erro] <0.11448.575> <<"ace_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.766 [erro] <0.11448.575> <<"ace_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.768 [erro] <0.11448.575> <<"face_attributes_v2">>},
Jun 28, 2022 @ 11:33:46.789 [erro] <0.11448.575> <<"face_attributes_v2">>}},
Jun 28, 2022 @ 11:34:19.301 [info] <0.684.0> queue 'face_attributes_v2' in vhost '/': detected a new leader {'%2Fface_attributes_v2','rab...@rabbitmq-1.rabbitmq-headless.ml.svc.cluster.local'} in term 10
---

2. The old messages (TTL: 10s) not discarded when the consumer was working again. I see there are too many ttl related errors as well during incident time (when one of the cluster member was not available)

----
Jun 28, 2022 @ 11:33:46.632 [erro] <0.3303.575> {<<"x-message-ttl">>,signedint,10000}],
Jun 28, 2022 @ 11:33:46.533 [erro] <0.10979.574> {<<"x-message-ttl">>,signedint,10000}],
----

Everything went back to normal when the failed cluster pod rejoined, but this has created a brief outage :-(

Wondering if someone here have gone through similar problems before, and I would appreciate if someone can share their thoughts on this! Thank you!

kjnilsson

unread,
Jul 12, 2022, 5:05:07 AM7/12/22
to rabbitmq-users
It would be helpful if you could share more of the log, unfortunately the default log settings log multi-line logs as separate log entries and we really need them all to know what the actually error is.

Also exact RabbitMQ version would be helpful to know.

Reply all
Reply to author
Forward
0 new messages