Hey Team,
we discover since some weeks now a suspicious behavior in rabbitmq. Somtimes, we can't find a direct trigger, one of our 3 Nodes explode in Memory Usage and didn't recover from these state.
After a restart of the service `sudo systemctl restart rabbitmq-server.service` the server is fully responsive and behaves normaly for around a week. We don't think this problem lay in the hardware because it didn't happen on one of the nodes all the time. It happens on randomly any of the 3 nodes in sometime.
One exception to "any of the 3 nodes" could we find. If the node don''t have any mirror queue slave operations it seems the node doesn't raise in memory so far.
Following data could be collected:
Memory Usage:
One thing to `ha.delete_offer.dead` this queue is consumed every few hours and republished to `ha.delete_offer`. On `ha.delete_offer` is a dead-letter-exchange which sends the message back to `ha.delete_offer.dead` a lock mechanism ensures this loop don't happen to the same time. Every message which looped for some few(3-4) days will be ignored and not looped again.
Could you help here to find out whats going on in our cluster?
Sincerely,
Bjarne