File descriptors not properly closing out following RabbitMQ/erlang upgrade

kris kloos

unread,

Feb 27, 2019, 11:01:05 AM2/27/19

to rabbitmq-users

I recently upgraded a cluster from RabbitMQ 3.7.9 to 3.7.11 and the zero dependency erlang runtime from 21.2-1 to 21.2.6-1.

Afterwards everything appeared to be stable however this morning our monitors triggered a warning that each broker in this cluster has exceeded the medium high watermark we have set for open file descriptors. The value continues to creep up and it doesn't appear that these handles are ever getting closed.

This cluster is mostly used by development teams to locally develop against an active rabbitmq server, it receives minimal traffic. I'm pretty sure it's something internal to the RabbitMQ process itself that isn't closing out these file handles, previously we rarely saw this value go above a few hundred, now it's sitting at 25k and counting on each broker.

Are there any reports for something like this with the the rabbitmq/erlang updates I pushed out?

-Kris

kris kloos

unread,

Feb 27, 2019, 11:40:57 AM2/27/19

to rabbitmq-users

Providing a little more info, here is the rabbitmq config I'm using:

{rabbit, [

{auth_backends, [rabbit_auth_backend_ldap]},

{proxy_protocol, true},

{cluster_nodes, {['node@node-01','node@node-02','node@node-03'], disc}},

{cluster_partition_handling,pause_minority},

{ssl_listeners, [5671]},

{ssl_options, [{cacertfile,"/etc/rabbitmq/secure/cacert.pem"},

{certfile,"/etc/rabbitmq/secure/cert.pem"},

{keyfile,"/etc/rabbitmq/secure/key.pem"},

{verify,verify_none},

{fail_if_no_peer_cert,false}

,{secure_renegotiate, true}

,{honor_cipher_order, true}

,{honor_ecc_order, true}

]},

{tcp_listeners, [5672]},

{tcp_listen_options, [binary,

{packet, raw},

{reuseaddr, true},

{backlog, 128},

{nodelay, true},

{exit_on_close, false},

{keepalive, false},

{linger, {true,0}}]},

{log_levels, [{ connection, info }]},

{heartbeat, 60}

]}

And these sockets appear to be the descriptors that are not getting closed out, output is from executing lsof against the rabbitMQ process

0_poller 2167 2523 rabbitmq *670u sock 0,7 0t0 742897 protocol: TCPv6

0_poller 2167 2523 rabbitmq *671u sock 0,7 0t0 742903 protocol: TCPv6

0_poller 2167 2523 rabbitmq *672u sock 0,7 0t0 742907 protocol: TCPv6

0_poller 2167 2523 rabbitmq *673u sock 0,7 0t0 742905 protocol: TCPv6

0_poller 2167 2523 rabbitmq *674u sock 0,7 0t0 741912 protocol: TCPv6

0_poller 2167 2523 rabbitmq *675u sock 0,7 0t0 741916 protocol: TCPv6

There's literally thousands of these from the lsof output.

Michael Klishin

unread,

Feb 27, 2019, 11:55:33 AM2/27/19

to rabbitm...@googlegroups.com

We are not aware of any issues with descriptor release. Except for [2], every reported

file handle leak I recall in the last several years ended up being an application or intermediary (proxy, LB) issue of some kind.

The only known problem was introduced in 3.7.10 or so and it was a matter of accounting of sockets [1].

See some recommendations and advice in [3][4].

1. https://github.com/rabbitmq/rabbitmq-management/issues/652

2. https://bugs.erlang.org/browse/ERL-430

3. https://www.rabbitmq.com/troubleshooting-networking.html#ports (the netstat/lsof parts)

4. https://www.rabbitmq.com/networking.html#dealing-with-high-connection-churn

5. https://www.rabbitmq.com/heartbeats.html

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,

Feb 27, 2019, 11:58:42 AM2/27/19

to rabbitm...@googlegroups.com

Do you have evidence that those sockets are supposed to be closed? The fact that RabbitMQ has a lot of sockets open isn't a proof

of an issue in RabbitMQ: your applications can be opening and not closing connections; the kernel doesn't immediately release

closed TCP connections [2].

netstat show TCP connection states. [1] has a lot of relevant information.

1. https://www.rabbitmq.com/networking.html#dealing-with-high-connection-churn

2. https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kris kloos

unread,

Feb 27, 2019, 12:36:50 PM2/27/19

to rabbitmq-users

Michael,

I have confirmed that the issue is related to how we have the tcp settings configured in rabbitmq and the proxy protocol implementation on the front-end. I'll respond back to this post once I get my issue resolved.

Thanks for the help on this!

-Kris

Michael Klishin

unread,

Feb 27, 2019, 1:46:47 PM2/27/19

to rabbitm...@googlegroups.com

I failed to connect the dots but yes, there may be a socket leak in case the Proxy protocol preamble is not yet handled when the socket is closed.

This is a rare condition which is why we haven't seen this since 3.7.11. There is a PR with a fix undergoing a review right now [1].

1. https://github.com/rabbitmq/rabbitmq-server/pull/1902/files

Reply all

Reply to author

Forward