Dear Community!
We are using a RabbitMQ cluster on Kubernets in HA mode with mirrored queues on 3 nodes.
Most of the time it works properly without any issues, there is no significant payload on the cluster at this moment.
Therefore sometimes the part of the queues get to unsynchronized state and can't get healed. The clients are not able to connect to the queues in this state. The only way to resolve the issue is to restart the whole cluster. We are not able to reproduce by any loadtests and force shutdowns.
Te RabbitMq is used as asynch integration and we prepare the exchanges and queues at startup with the definitions.json file.
Any idea where we have to find the root of the problem to eliminate this issue?
Thank you in advance, Zsolt