Hi,
I hope this is the correct place to post this. Please let me know if i need to post this some where else.
Our current RabbitMQ production environment has been running smoothly since 6 months, but starting today we are getting intermittently errors from our producers / consumers that either the connections are dropping or not connecting at all. Then it stabilizes before the next set of errors.
Here is our environment information:
RabbitMQ Version 3.7.9
Erlang 21.3.8.4
OS: CentOS 7
CPU 6. CPU Utilization hovers between 60-80%.
RAM: 16GB . RAM Utilization hovers between 4-6GB
Cluster of 3 RabbitMQ Servers with HA Policy enabled.
RabbitMQ.Config and List of Plugin's enabled are attached.
RabbitMQ crash.log files is attached. We have not restarted RabbitMQ service yet. The error comes and goes.
Please note that we have 100's of publishers / consumers who are connected and working and all of a sudden we get a period of 2-5 mins of dropped connections. Note: Even though we have enabled LDAP integration (Active directory) the publishers and consumer only use Internal Rabbit Accounts.
All publishers and consumers use TLS.
Some of the errors that publisher and consumers get are:
Error 1:
RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.PossibleAuthenticationFailureException: Possibly caused by authentication failure ---> RabbitMQ.Client.Exceptions.OperationInterruptedException: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=0, text="End of stream", classId=0, methodId=0, cause=System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
Error 2:
RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed ---> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it *.*.*.*:*
Error3:
RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable ---> RabbitMQ.Client.Exceptions.PossibleAuthenticationFailureException: Possibly caused by authentication failure
Reviewing the client logs and RabbitMQ crash.log, it looks like the RabbitMQ Internal User SSL authentication is failing but i am not sure about it. For now Not sending the RabbitMQ logs as it is big but here is some sample logs mostly warnings from there:
2020-11-22 12:03:44.999 [warning] <0.129.0> lager_error_logger_h dropped 189 messages in the last second that exceeded the limit of 1000 messages/sec
2020-11-22 12:03:49.000 [warning] <0.129.0> lager_error_logger_h dropped 66 messages in the last second that exceeded the limit of 1000 messages/sec
2020-11-22 12:03:52.000 [warning] <0.129.0> lager_error_logger_h dropped 56 messages in the last second that exceeded the limit of 1000 messages/sec
2020-11-22 12:03:53.596 [warning] <0.28060.6424> closing AMQP connection <0.28060.6424> (***.***.****.***:***** -> 1***.***.***.****:5671, vhost: '/', user: '********'):
client unexpectedly closed TCP connection
Any insights will be appreciated especially on the crash.log as to what it says.
Thanks.
Hi,