RabbitMQ - Stops responding for several minutes

159 views
Skip to first unread message

Alf Kato Brandal

unread,
Jun 29, 2021, 7:55:43 AM6/29/21
to rabbitmq-users
We have a rabbitmq server 3.8.2 / erl 22.3 
It has 10 virtualhosts and 10 users. It is running on a windows server 2016 standard. 

Lately we have experienced some strange issues. Suddenly the rabbitmq server stops responding for some minutes. If you go to the web management page, you can see the number of messages just "piling up" and the clients do not receive them. The web page shows "unable to connect to server for ..." messages. Then just as suddenly, the problem disappears, and everything works smoothly again. 

I have attached the crash.log from the latest "event". Hope someone can help me! 
crash.log

Yong Hua Peng

unread,
Jun 30, 2021, 6:02:51 AM6/30/21
to rabbitm...@googlegroups.com
Can you check if there is antivirus or firewall running on the OS?
I'd suggest you to upgrade both rabbitmq and erlang version to the latest.
 
regards.
 
29.06.2021, 19:55, "Alf Kato Brandal" <alf...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/4716967d-87ff-4c77-89f9-620567366e75n%40googlegroups.com.

Alf Kato Brandal

unread,
Jul 8, 2021, 3:30:29 AM7/8/21
to rabbitmq-users
I will check this. We cannot upgrade to latest version, because of requirements in our application... I will check the other things first. 

Michal Kuratczyk

unread,
Jul 8, 2021, 4:38:59 AM7/8/21
to rabbitm...@googlegroups.com
You could be hitting an issue which is a combination of two factors:
1. Some network-related or server-load related slowness
2. Combined with a 1 second timeout we had until 3.8.9

The timeout is now 5 seconds to avoid false positives: https://github.com/rabbitmq/rabbitmq-server/pull/2450
Talk to your application team - there is no reason to require running unpatched software.

Best,



--
Michał
RabbitMQ team

Alf Kato Brandal

unread,
Jul 8, 2021, 6:51:51 AM7/8/21
to rabbitmq-users
1. I have contacted our hosting company to see if they can disable antivirus and see if that helps. 
2. Is it possible to set this timeout via config anywhere, so I can try that in the current version? 

I totally agree, and we are going to upgrade. There is just so many references that they need to do it in a controlled manner, and that is not on the priority list of our scrum master at the moment :) 

Michal Kuratczyk

unread,
Jul 8, 2021, 7:06:43 AM7/8/21
to rabbitm...@googlegroups.com
Try adding this to your advanced.config:
[ {aten, [ {poll_interval, 5000} ] } ].

I only tested on a recent version but I think it will work with versions below 3.8.9 as well.



--
Michał
RabbitMQ team

Alf Kato Brandal

unread,
Jul 8, 2021, 9:53:17 AM7/8/21
to rabbitmq-users
Hi! I have checked, we can upgrade to 3.8.9, but not to latest version. Would 3.8.9 have this fix? 

Michal Kuratczyk

unread,
Jul 8, 2021, 10:24:17 AM7/8/21
to rabbitm...@googlegroups.com
https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.9
-----
Raft implementation's failure detector default polling interval has been increased from 1s to 5s. The previously used default results in too frequent leader elections in networks with high packet loss (say, double digit percent).
-----



--
Michał
Reply all
Reply to author
Forward
0 new messages