RabbitMQ 3.7.9 Dropping connections intermittently

1,527 views
Skip to first unread message

Rabbitmq Support

unread,
Nov 25, 2020, 6:35:26 PM11/25/20
to rabbitmq-users

Hi,

I hope this is the correct place to post this.  Please let me know if i need to post this some where else.

Our current RabbitMQ production environment has been running smoothly since 6 months, but starting today we are getting intermittently errors from our producers / consumers that either the connections are dropping or not connecting at all.  Then it stabilizes before the next set of errors.

 Here is our environment information:

RabbitMQ Version 3.7.9

Erlang 21.3.8.4

OS: CentOS 7

CPU 6.  CPU Utilization hovers between 60-80%.

RAM: 16GB .  RAM Utilization hovers between 4-6GB

Cluster of 3 RabbitMQ Servers with HA Policy enabled.

 

RabbitMQ.Config  and List of Plugin's enabled are attached.

 RabbitMQ crash.log files is attached.  We have not restarted RabbitMQ service yet.  The error comes and goes.

 Please note that we have 100's of publishers / consumers who are connected and working and all of a sudden we get a period of 2-5 mins of dropped connections. Note: Even though we have enabled LDAP integration (Active directory) the publishers and consumer only use Internal Rabbit Accounts.

 

All publishers and consumers use TLS.

 

Some of the errors that publisher and consumers get are:

Error 1:

RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.PossibleAuthenticationFailureException: Possibly caused by authentication failure ---> RabbitMQ.Client.Exceptions.OperationInterruptedException: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=0, text="End of stream", classId=0, methodId=0, cause=System.IO.EndOfStreamException: Unable to read beyond the end of the stream.  

 Error 2:

RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it *.*.*.*:*

 

Error3:

RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.PossibleAuthenticationFailureException: Possibly caused by authentication failure  

 Reviewing the client logs  and RabbitMQ crash.log, it looks like the RabbitMQ Internal User SSL authentication is failing but i am not sure about it.  For now Not sending the RabbitMQ logs as it is big but here is some sample logs mostly warnings from there:

 2020-11-22 12:03:44.999 [warning] <0.129.0> lager_error_logger_h dropped 189 messages in the last second that exceeded the limit of 1000 messages/sec

2020-11-22 12:03:49.000 [warning] <0.129.0> lager_error_logger_h dropped 66 messages in the last second that exceeded the limit of 1000 messages/sec

2020-11-22 12:03:52.000 [warning] <0.129.0> lager_error_logger_h dropped 56 messages in the last second that exceeded the limit of 1000 messages/sec

2020-11-22 12:03:53.596 [warning] <0.28060.6424> closing AMQP connection <0.28060.6424> (***.***.****.***:***** -> 1***.***.***.****:5671, vhost: '/', user: '********'):

client unexpectedly closed TCP connection

 

 Any insights will be appreciated especially on the crash.log as to what it says.

 

Thanks.

crash.log
RabbitMqPluginList.txt
rabbitmq.conf

Luke Bakken

unread,
Nov 30, 2020, 2:34:33 PM11/30/20
to rabbitmq-users
Hello,

If an environment runs smoothly for 6 months, then starts exhibiting symptoms, something has changed. Have you audited all recent changes to this environment?

The crash log shows that the TCP connection closed before the TLS handshake could finish:

handshake_timeout,{ssl_closed,{sslsocket

This means that either your application had an error and closed the connection abruptly, or there is a network device between RabbitMQ and your application that closed the connection.

client unexpectedly closed TCP connection is a message seen when a client application or network device unexpectedly closes a TCP connection.

In any case, the provided information shows that the issue is external to RabbitMQ.

Thanks,
Luke

Kumar Rmqs

unread,
Nov 30, 2020, 8:58:17 PM11/30/20
to rabbitmq-users
Hi,
Thanks for taking out the time to respond.

For RabbitMQ, we are sure nothing changed as we are the gatekeepers of these Servers.  However not sure for any changes external to RabbitMQ in our environment.  It certainly looks like the issue's scope was more then RabbitMQ as around 20/30 servers experienced this issue.  What threw us off was the existence of crash logs on RabbitMQ server: /var/log/rabbitmq/log/crash.log.

For our knowledge can you put some light as to:
1. Why RabbitMQ logs broken client connections to /var/log/rabbitmq/log/crash.log rather then the normal log: /var/log/rabbitmq/rab...@servername.log ?

2. Some clients logs reported Authentication failures:  

RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.PossibleAuthenticationFailureException: Possibly caused by authentication failure

We know for a fact that the username/password were correct as other messages were coming successfully through.  Is it possible that if the connection broke and RabbitMQ was in the process of authenticating the user, it would throw this error ?

The existence of a crash.log made as believe it was a RabbitMQ issue.

Thanks.

Luke Bakken

unread,
Dec 1, 2020, 8:36:34 AM12/1/20
to rabbitmq-users
Hello,

The "crash.log" file is specific to an Erlang system. Like you, many users are surprised by the use of the word "crash" but it does not mean fatal problems were happening. It's like having unhandled exceptions in other programming runtimes, except in Erlang it will never bring down the entire system.

More than likely there are log messages in the RabbitMQ log file at the same timestamp as entries in the crash.log file but I don't have your entire RabbitMQ log file so I can't verify. The few log messages you provided show that TCP clients (or a device like a firewall or load balancer) were closing their connection unexpectedly, perhaps during the TLS handshake causing the crash.log entries.

The log entries about dropped log file messages could be due to abusive applications trying to connect over and over. RabbitMQ will stop logging messages once the rate exceeds the threshold to prevent logging overwhelming the system.

Finally, the exception your application raised is PossibleAuthenticationFailureException. Note the "Possible". Depending on when the TCP error occurs during connection establishment, it is not possible for a client application to distinguish between a TLS handshake failure, authentication failure and other failures. If your connection had made it to the authentication phase then failed, that would be logged by RabbitMQ.

Thanks,
Luke

On Monday, November 30, 2020 at 5:58:17 PM UTC-8 rmqs...@gmail.com wrote:
Hi,

Kumar Rmqs

unread,
Dec 1, 2020, 3:18:40 PM12/1/20
to rabbitmq-users
Hi,
Thanks for sharing your feedback, it is greatly appreciated.

Reply all
Reply to author
Forward
0 new messages