File descriptors not properly closing out following RabbitMQ/erlang upgrade

50 views
Skip to first unread message

kris kloos

unread,
Feb 27, 2019, 11:01:05 AM2/27/19
to rabbitmq-users
I recently upgraded a cluster from RabbitMQ 3.7.9 to 3.7.11 and the zero dependency erlang runtime from 21.2-1 to 21.2.6-1. 

Afterwards everything appeared to be stable however this morning our monitors triggered a warning that each broker in this cluster has exceeded the medium high watermark we have set for open file descriptors. The value continues to creep up and it doesn't appear that these handles are ever getting closed.

This cluster is mostly used by development teams to locally develop against an active rabbitmq server, it receives minimal traffic. I'm pretty sure it's something internal to the RabbitMQ process itself that isn't closing out these file handles, previously we rarely saw this value go above a few hundred, now it's sitting at 25k and counting on each broker.

Are there any reports for something like this with the the rabbitmq/erlang updates I pushed out?
-Kris

kris kloos

unread,
Feb 27, 2019, 11:40:57 AM2/27/19
to rabbitmq-users
Providing a little more info, here is the rabbitmq config I'm using:

  {rabbit, [
    {auth_backends, [rabbit_auth_backend_ldap]},
    {proxy_protocol, true},
    {cluster_nodes, {['node@node-01','node@node-02','node@node-03'], disc}},
    {cluster_partition_handling,pause_minority},
    {ssl_listeners, [5671]},
    {ssl_options, [{cacertfile,"/etc/rabbitmq/secure/cacert.pem"},
                    {certfile,"/etc/rabbitmq/secure/cert.pem"},
                    {keyfile,"/etc/rabbitmq/secure/key.pem"},
                    {verify,verify_none},
                    {fail_if_no_peer_cert,false}


                    ,{secure_renegotiate, true}
                    ,{honor_cipher_order, true}
                    ,{honor_ecc_order, true}
                    ]},

    {tcp_listeners, [5672]},

    {tcp_listen_options, [binary,
                          {packet, raw},
                          {reuseaddr, true},
                          {backlog, 128},
                          {nodelay, true},
                          {exit_on_close, false},
                          {keepalive, false},
                          {linger, {true,0}}]},
    {log_levels, [{ connection, info }]},
    {heartbeat, 60}
  ]}

And these sockets appear to be the descriptors that are not getting closed out, output is from executing lsof against the rabbitMQ process
0_poller   2167  2523       rabbitmq *670u     sock                0,7       0t0     742897 protocol: TCPv6
0_poller   2167  2523       rabbitmq *671u     sock                0,7       0t0     742903 protocol: TCPv6
0_poller   2167  2523       rabbitmq *672u     sock                0,7       0t0     742907 protocol: TCPv6
0_poller   2167  2523       rabbitmq *673u     sock                0,7       0t0     742905 protocol: TCPv6
0_poller   2167  2523       rabbitmq *674u     sock                0,7       0t0     741912 protocol: TCPv6
0_poller   2167  2523       rabbitmq *675u     sock                0,7       0t0     741916 protocol: TCPv6

There's literally thousands of these from the lsof output.

Michael Klishin

unread,
Feb 27, 2019, 11:55:33 AM2/27/19
to rabbitm...@googlegroups.com
We are not aware of any issues with descriptor release. Except for [2], every reported
file handle leak I recall in the last several years ended up being an application or intermediary (proxy, LB) issue of some kind.

The only known problem was introduced in 3.7.10 or so and it was a matter of accounting of sockets [1].

See some recommendations and advice in [3][4].


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Feb 27, 2019, 11:58:42 AM2/27/19
to rabbitm...@googlegroups.com
Do you have evidence that those sockets are supposed to be closed? The fact that RabbitMQ has a lot of sockets open isn't a proof
of an issue in RabbitMQ: your applications can be opening and not closing connections; the kernel doesn't immediately release
closed TCP connections [2].

netstat show TCP connection states. [1] has a lot of relevant information.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kris kloos

unread,
Feb 27, 2019, 12:36:50 PM2/27/19
to rabbitmq-users
Michael,

I have confirmed that the issue is related to how we have the tcp settings configured in rabbitmq and the proxy protocol implementation on the front-end. I'll respond back to this post once I get my issue resolved.

Thanks for the help on this!

-Kris

Michael Klishin

unread,
Feb 27, 2019, 1:46:47 PM2/27/19
to rabbitm...@googlegroups.com
I failed to connect the dots but yes, there may be a socket leak in case the Proxy protocol preamble is not yet handled when the socket is closed.
This is a rare condition which is why we haven't seen this since 3.7.11. There is a PR with a fix undergoing a review right now [1].

Reply all
Reply to author
Forward
0 new messages