Unsynchronized mirror queue - Kubernetes

1,207 views
Skip to first unread message

paroczizs

unread,
May 4, 2021, 4:06:45 AM5/4/21
to rabbitm...@googlegroups.com
Dear Community!

We are using a RabbitMQ cluster on Kubernets in HA mode with mirrored queues on 3 nodes.
Most of the time it works properly without any issues, there is no significant payload on the cluster at this moment.
Therefore sometimes the part of the queues get to unsynchronized state and can't get healed. The clients are not able to connect to the queues in this state. The only way to resolve the issue is to restart the whole cluster. We are not able to reproduce by any loadtests and force shutdowns.
Te RabbitMq is used as asynch integration and we prepare the exchanges and queues at startup with the definitions.json file.

Any idea where we have to find the root of the problem to eliminate this issue?

Thank you in advance, Zsolt

Michal Kuratczyk

unread,
May 4, 2021, 4:15:22 AM5/4/21
to rabbitm...@googlegroups.com
Hi,

We can't suggest anything without the logs. Please share logs from all nodes and any additional diagnostic commands you use. Also, what is the error returned by the clients in this situation?

Also, as we say almost every day, mirrored classic queues should be considered deprecated - please start planning your move to quorum queues.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CACb-xVVD%3D_ur899hbxpuB4TRpZxn_RXxbuwiKfKPFcsHTY9TjQ%40mail.gmail.com.


--
Michał
RabbitMQ team

paroczizs

unread,
May 4, 2021, 4:30:36 AM5/4/21
to rabbitmq-users
The log contains this kind of errors:

2021-02-15 10:17:47.479 [error] <0.23146.5> Channel error on connection <0.23138.5> (10.233.115.64:27912 -> 10.233.84.115:5671, vhost: 'hapa', user: 'webmethods'), channel 1:
operation basic.get caused a channel exception not_found: failed to perform operation on queue 'webmethods.testMessage.webmethods.queue' in vhost 'hapa' due to timeout

2021-02-15 10:18:00.801 [error] <0.23318.5> closing AMQP connection <0.23318.5> (10.233.115.80:37390 -> 10.233.84.115:5672):

Michal Kuratczyk

unread,
May 4, 2021, 4:58:15 AM5/4/21
to rabbitm...@googlegroups.com
The first error suggests to me that there is a problem with communication between the nodes (that would also explain synchronization issues) - check your networking and whether the pods are available at all times.

The second error is probably unrelated - someone connected an HTTP client to an AMQP port (you can see the "GET /met" which is an HTTP request).

Best,



--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages