Unsynchronized mirror queue

paroczizs

unread,

May 4, 2021, 4:06:45 AM5/4/21

to rabbitm...@googlegroups.com

Dear Community!

We are using a RabbitMQ cluster on Kubernets in HA mode with mirrored queues on 3 nodes.

Most of the time it works properly without any issues, there is no significant payload on the cluster at this moment.

Therefore sometimes the part of the queues get to unsynchronized state and can't get healed. The clients are not able to connect to the queues in this state. The only way to resolve the issue is to restart the whole cluster. We are not able to reproduce by any loadtests and force shutdowns.

Te RabbitMq is used as asynch integration and we prepare the exchanges and queues at startup with the definitions.json file.

Any idea where we have to find the root of the problem to eliminate this issue?

Thank you in advance, Zsolt

Michal Kuratczyk

unread,

May 4, 2021, 4:15:22 AM5/4/21

to rabbitm...@googlegroups.com

Hi,

We can't suggest anything without the logs. Please share logs from all nodes and any additional diagnostic commands you use. Also, what is the error returned by the clients in this situation?

Also, as we say almost every day, mirrored classic queues should be considered deprecated - please start planning your move to quorum queues.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CACb-xVVD%3D_ur899hbxpuB4TRpZxn_RXxbuwiKfKPFcsHTY9TjQ%40mail.gmail.com.

--

Michał

RabbitMQ team

paroczizs

unread,

May 4, 2021, 4:30:36 AM5/4/21

to rabbitmq-users

The log contains this kind of errors:

2021-02-15 10:17:47.479 [error] <0.23146.5> Channel error on connection <0.23138.5> (10.233.115.64:27912 -> 10.233.84.115:5671, vhost: 'hapa', user: 'webmethods'), channel 1:

operation basic.get caused a channel exception not_found: failed to perform operation on queue 'webmethods.testMessage.webmethods.queue' in vhost 'hapa' due to timeout

2021-02-15 10:18:00.801 [error] <0.23318.5> closing AMQP connection <0.23318.5> (10.233.115.80:37390 -> 10.233.84.115:5672):

{bad_header,<<"GET /met">>}

Unfortunately there is no log from the clients therefore the issue on their side that they cannot connect to the queue.

The console looks like this way.
https://alerantonline-my.sharepoint.com/:i:/g/personal/paroczi_zsolt_alerant_hu/EfzKqxYApaZNs9BSXnzUejIBaX-w51p7ZMMIiDQ05Njfug?e=bF3udQ

https://alerantonline-my.sharepoint.com/:i:/g/personal/paroczi_zsolt_alerant_hu/Ee9BVylUbcBFktM_OBVA4KsBsGoSBSbwubgDd2jgsVNUUw?e=0diskJ

Michal Kuratczyk

unread,

May 4, 2021, 4:58:15 AM5/4/21

to rabbitm...@googlegroups.com

The first error suggests to me that there is a problem with communication between the nodes (that would also explain synchronization issues) - check your networking and whether the pods are available at all times.

The second error is probably unrelated - someone connected an HTTP client to an AMQP port (you can see the "GET /met" which is an HTTP request).

Best,

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/47e5c903-fbf0-402f-81ff-304db15d4442n%40googlegroups.com.

--

Michał

RabbitMQ team

Reply all

Reply to author

Forward

Unsynchronized mirror queue - Kubernetes

paroczizs

Michal Kuratczyk

paroczizs

Michal Kuratczyk