We have a problem with rabbit mq. It works fine most of the time, but at random moments, our rabbitmq clients can no longer retrieve messages or post messages on the queues. However, the port 5671 or 5672 is still open and the rabbitmq service is still running. Also, if rabbitmq management is enabled, we cannot access the web page despite port 15672 being open. At these moments, restarting the rabbitmq service fix everything.
2018-01-23 19:42:37.501 [error] <0.176.0> gen_server <0.176.0> terminated with reason: heartbeat_timeout
I ran rabbitmqctl.bat status and rabbitmqctl environment as suggested while reproducing, here is the result:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.6.8\sbin>rabbitmqctl.bat status
Status of node rabbit@GDENHEZ ...
Error: unable to connect to node rabbit@GDENHEZ: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit@GDENHEZ]
rabbit@GDENHEZ:
* connected to epmd (port 4369) on GDENHEZ
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* TCP connection to remote host has timed out. Is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-30@GDENHEZ'
- home dir: C:\Users\gdenhez
- cookie hash: Xh6oQS6jc8HA63HKz5r4ZQ==
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.6.8\sbin>rabbitmqctl environment
Application environment of node rabbit@GDENHEZ ...
Error: unable to connect to node rabbit@GDENHEZ: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit@GDENHEZ]
rabbit@GDENHEZ:
* connected to epmd (port 4369) on GDENHEZ
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* TCP connection to remote host has timed out. Is the Erlang distribution using TLS?
current node details:
- node name: 'rabbitmq-cli-22@GDENHEZ'
- home dir: C:\Users\gdenhez
- cookie hash: Xh6oQS6jc8HA63HKz5r4ZQ==
Looking at this forum, I found out that someone solved the issue by upgrading the OTP version used by rabbit to 19.3. I tried this, it does feel the issue occurs less often but it is still occurring. Also, it seem to occur only on certain machines.
Do you have any other ideas of what can be causing the issue?
Thanks a lot!
N.B. The version of rabbit we use is 3.6.9