Extra TCP connections on the server after resource alarm

72 views
Skip to first unread message

Sergey Belesev

unread,
Feb 10, 2018, 4:43:01 PM2/10/18
to rabbitmq-users

I have RabbitMQ Server 3.6.0 installed on Windows (I know it's time to upgrade, I've already done that on the other server node).

Heartbeats are enabled on both server and client side (heartbeat interval 60s).


I have had a resource alarm (RAM limit), and after that I have observed the raise of amount of TCP connections to RMQ Server.

At the moment there're 18000 connections while normal amount is 6000.

Via management plugin I can see there is a lot of connections with 0 channels, while our "normal" connection have at least 1 channel.

And even RMQ Server restart won't help: all connections would re-establish.

   1. Does that mean all of them are really alive?


Similar issue was described here https://github.com/rabbitmq/rabbitmq-server/issues/384, but as I can see it was fixed exactly in v3.6.0.

   2. Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?


Maybe important: we have haProxy between the server and the clients. 

   3. Could haProxy be an explanation for this extra connections? Maybe it prevents client from receiving a signal the connection was closed due to resource alarm?

V Z

unread,
Feb 10, 2018, 7:02:09 PM2/10/18
to rabbitmq-users
We noticed something very similar in 3.6.12 but did not report because we did not take the time to gather the evidence.

We start our test, which would incidentally result in a node running out of RAM and connections getting blocked. We would stop the test, but those blocked connections remain even after memory alarm cleared.

We also had HAProxy in this test. I don't recall seeing this with a load balancer like F5. Restarting the entire cluster cleared them, though.

Luke Bakken

unread,
Feb 10, 2018, 8:09:13 PM2/10/18
to rabbitmq-users
Hi Sergey,

What version of Erlang are you using?

Thanks -
Luke

Sergey Belesev

unread,
Feb 11, 2018, 3:11:20 AM2/11/18
to rabbitmq-users
Luke,
That is Erlang 18.2.1 [64-bit].

воскресенье, 11 февраля 2018 г., 4:09:13 UTC+3 пользователь Luke Bakken написал:

Sergey Belesev

unread,
Feb 11, 2018, 3:13:40 AM2/11/18
to rabbitmq-users
Restarting the entire cluster cleared them
What is "cluster" in your case? We have a single node of RMQ.

Do you mean restarting RMQ cluster or restarting RMQ+haProxy  helped?



воскресенье, 11 февраля 2018 г., 3:02:09 UTC+3 пользователь V Z написал:

Luke Bakken

unread,
Feb 11, 2018, 11:39:54 AM2/11/18
to rabbitmq-users
Hi Sergey,

I suspected that you are running that version of Erlang. Erlang 18 has a number of bugs that affect TCP connections. We strongly recommend upgrading to at least 19.3.

Thanks,
Luke

Michael Klishin

unread,
Feb 12, 2018, 9:29:34 AM2/12/18
to rabbitm...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Sergey Belesev

unread,
Feb 14, 2018, 3:07:12 AM2/14/18
to rabbitmq-users
I've managed to reproduce the problem: in the end it was a bug in the way our client used RMQ connections.
It created 1 auto-recovery connection (that's all fine with that) and sometimes it created a separate simple connection for "temporary" purposes.

Step to reproduce my problem were:
  1. Reach memory alarm in RabbitMQ (e.g. set up an easily reached RAM limit and push a lot of big messages). Connections would be in state "blocking".
  2. Start sending message from our client with this new "temp" connection.
  3. Ensure the connection is in state "blocked".
  4. Without eliminating resource alarm, restart RabbitMQ node.
  5. The "temp" connection itself was here! Despite the fact auto-recovery was not enabled for it. And it continued sending heartbeats so the server didn't close it.
We will fix the client to use one and the only connection always.
Plus we of course will upgrade Erlang.

Thank you, guys.

Michael Klishin

unread,
Feb 14, 2018, 12:36:02 PM2/14/18
to rabbitm...@googlegroups.com
So if my understanding is correct, you are looking at something similar to

https://github.com/rabbitmq/rabbitmq-common/pull/31
https://github.com/rabbitmq/rabbitmq-java-client/issues/341

but from the client end (and possibly with a different client library).

When the opposite end of a TCP connection stops reading from the socket, which is the case with alarms,
it is much trickier than it seems for connection loss to be detected. It works more or less the same way with
sockets that have a full TCP window (for unrelated reasons).

This has been addressed in the RabbitMQ server and recently in the Java client. It is possible to work around this,
I just wanted to point out that similar problems have been known for several years now and besides https://github.com/rabbitmq/rabbitmq-server/issues/1474
(which is an Erlang's TLS implementation regression), we no longer see any variations on the RabbitMQ end.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages