RabbitMQ Node in a cluster crashing (3.8.1)

Sagar Shah

unread,

Oct 27, 2020, 5:28:12 PM10/27/20

to rabbitmq-users

Here's the RabbitMQ Cluster Details

Node Count: 3

Version: 3.8.1

Infrastructure: AWS EC2 (each node in different availability zone)

We occasionally notice one of the rabbitmq nodes crashing and restarting, which requires us to synchronize the queue manually.

Further to that, after we synchronize all the queues, we also need to restart many (well, not all but some) of our micro services (spring boot that has rabbitmq queue listeners) so that listeners make the connections with queue and resume processing messages. (Note: We generally notice some of the queues have 0 consumers so we pick those micro services for restart.)

This issue happens every once in 2 weeks. Attached are some of the error logs at the time of crash.

Any help in further troubleshooting this is appreciated.

Please let me know, if any further details are needed.

rabbit-error.log

Wesley Peng

unread,

Oct 27, 2020, 8:35:55 PM10/27/20

to rabbitm...@googlegroups.com

On 2020/10/28 5:28 上午, Sagar Shah wrote:
> Further to that, after we synchronize all the queues, we also need to
> restart many (well, not all but some) of our micro services (spring
> boot that has rabbitmq queue listeners) so that listeners make the
> connections with queue and resume processing messages. (Note: We
> generally notice some of the queues have 0 consumers so we pick those
> micro services for restart.)

This is old issue for RMQ running on cloud. You should have monitoring
for IaaS metrics such as networking, disk IO, memory usage etc. Sometime
a network issue, or VM taking snapshot can cause the similar issues.

Regards.

Sagar Shah

unread,

Oct 28, 2020, 7:43:52 AM10/28/20

to rabbitmq-users

Thank you for getting back on this issue. We do have monitoring in place, but it still requires us to take all the steps (listed above) to recover from that situation (in reactive manner).

We are looking for the cause of error and possible fix to prevent it from happening. Is there already a bug/issue for this in rabbitmq GitHub for tracking?

Appreciate your suggestions

Reply all

Reply to author

Forward