TCP connection succeeded but Erlang distribution failed

2,844 views
Skip to first unread message

Gabrielle Denhez

unread,
Feb 2, 2018, 9:36:44 AM2/2/18
to rabbitmq-users
Hello,

We have a problem with rabbit mq. It works fine most of the time, but at random moments, our rabbitmq clients can no longer retrieve messages or post messages on the queues. However, the port 5671 or 5672 is still open and the rabbitmq service is still running. Also, if rabbitmq management is enabled, we cannot access the web page despite port 15672 being open. At these moments, restarting the rabbitmq service  fix everything.

In the log of one of our client (written in erlang), we can see this:

2018-01-23 19:42:37.501 [error] <0.176.0> gen_server <0.176.0> terminated with reason: heartbeat_timeout


I ran rabbitmqctl.bat status and rabbitmqctl environment as suggested while reproducing, here is the result:


C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.6.8\sbin>rabbitmqctl.bat status
Status of node rabbit@GDENHEZ ...
Error: unable to connect to node rabbit@GDENHEZ: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@GDENHEZ]

rabbit@GDENHEZ:
  * connected to epmd (port 4369) on GDENHEZ
  * epmd reports node 'rabbit' running on port 25672
  * TCP connection succeeded but Erlang distribution failed

  * TCP connection to remote host has timed out. Is the Erlang distribution using TLS?


current node details:
- node name: 'rabbitmq-cli-30@GDENHEZ'
- home dir: C:\Users\gdenhez
- cookie hash: Xh6oQS6jc8HA63HKz5r4ZQ==


C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.6.8\sbin>rabbitmqctl environment
Application environment of node rabbit@GDENHEZ ...
Error: unable to connect to node rabbit@GDENHEZ: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@GDENHEZ]

rabbit@GDENHEZ:
  * connected to epmd (port 4369) on GDENHEZ
  * epmd reports node 'rabbit' running on port 25672
  * TCP connection succeeded but Erlang distribution failed

  * TCP connection to remote host has timed out. Is the Erlang distribution using TLS?


current node details:
- node name: 'rabbitmq-cli-22@GDENHEZ'
- home dir: C:\Users\gdenhez
- cookie hash: Xh6oQS6jc8HA63HKz5r4ZQ==


Looking at this forum, I found out that someone solved the issue by upgrading the OTP version used by rabbit to 19.3. I tried this, it does feel the issue occurs less often but it is still occurring. Also, it seem to occur only on certain machines.

Do you have any other ideas of what can be causing the issue?

Thanks a lot!

N.B. The version of rabbit we use is 3.6.9

Michael Klishin

unread,
Feb 2, 2018, 11:59:39 AM2/2/18
to rabbitm...@googlegroups.com
I assume that if your client is in Erlang I don't have to explain this
but still: there is a section on authentication in [1].

Missed heartbeats suggest a network connection problem but successful TCP connection
and failing Erlang distribution is confusing. Consider posting more server logs around
such events.

Are you running on Erlang 18.x or 19 up to 19.3 by any chance? Prior to 19.3.6.2 there are
known bugs that cause nodes to go into a limbo state when they do not accept inbound connections
and cannot shut down:

https://groups.google.com/d/msg/rabbitmq-users/hK323-O6-tw/yRZTzq2uAgAJ
https://bugs.erlang.org/browse/ERL-430
https://bugs.erlang.org/browse/ERL-448

And, of course, please move to at least 3.6.15.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Gabrielle Denhez

unread,
Feb 2, 2018, 1:34:51 PM2/2/18
to rabbitmq-users
Hello, thanks for the quick answer!

- In the rabbitmq server log, we could see those kind of logs around the time the issue was occurring. However, we could see similar logs at other times. I'll try to get more logs next time we reproduce.

=WARNING REPORT==== 23-Jan-2018::19:42:37 ===
closing AMQP connection <0.753.0> (127.0.0.1:3393 -> 127.0.0.1:5671):
client unexpectedly closed TCP connection

=WARNING REPORT==== 23-Jan-2018::19:42:37 ===
closing AMQP connection <0.761.0> (127.0.0.1:3395 -> 127.0.0.1:5671):
client unexpectedly closed TCP connection


- I tried to run rabbitmq with OTP 19.3. However, I'm not sure how to verify what specific version it is. I downloaded OTP from here: https://www.erlang.org/downloads/19.3. The download was made on 2018-01-29 (this monday).
Also, just to be sure, here are the steps I did to change the OTP version of rabbit:
1. I changed the environment variable ERLANG_HOME
2. I unistalled the rabbit MQ service this way: rabbitmq-service remove
3. I reinstalled the service: rabbitmq-service start
4. I started the service


- I will try to upgrade rabbit to at least 3.6.15

Michael Klishin

unread,
Feb 2, 2018, 3:07:48 PM2/2/18
to rabbitm...@googlegroups.com
Those log messages are explained in https://www.rabbitmq.com/networking.html#logging.

According to RabbitMQ your apps close TCP connections, which over localhost has only
one explanation: they fail (either the connection Erlang process or the OS process/BEAM).

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gabrielle Denhez

unread,
Feb 5, 2018, 8:37:14 AM2/5/18
to rabbitmq-users
Hello,

I ran my setup for the weekend with rabbitmq server 3.6.15 and OTP 20.0

So far so good, everything is running fine. I don't seem to reproduce my issue anymore. I guess I needed an upgrade of both OTP and rmq.

Thanks a lot for your help!
Gabrielle

Michael Klishin

unread,
Feb 5, 2018, 11:08:26 AM2/5/18
to rabbitm...@googlegroups.com
Nothing has changed around Erlang distribution in many years in either RabbitMQ or Erlang/OTP
(except for the default Erlang cookie path on Windows in 20.2) but hey, if that somehow helped,
can't argue with that ;)

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages