[erlang-questions] ssl_esock leaking file descriptors

46 views
Skip to first unread message

Justin Milam

unread,
Aug 31, 2011, 10:13:30 AM8/31/11
to erlang-q...@erlang.org
I've started to notice a slow leak of file descriptors in the ssl_esock port. I'm running Erlang R14B and using SSL to encrypt traffic over the Erlang distribution protocol. The cluster has 10 nodes minimum with transient nodes joining and leaving the cluster regularly. From checking the ssl_esock process with lsof it appears to be slowly leaking file descriptors. The number of open file descriptors seems to increase after a node joins the cluster and then leaves. Eventually ssl_esock holds open enough file descriptors to hit the ulimit (currently 8192) in which case ssl_esock goes into an infinite loop using near 100% of one of the CPUs.

I've been able to reproduce the issue by lowering the ulimit and continually connecting/disconnecting a remote shell to a local running node until the ulimit is reached. When ssl_esock is running in debug mode I see the following being logged continually:

==========LOOP=============
MASKS SET FOR FD: 27 (read) 26 (read) 25 (read) 24 (read) 19 (read) 18 (read) 17 (read) 16 (read) 12 (read) 11 (read) 10 (read) 9 (read) 8 (read) 7 (read) 6 (read) 
CONNECTIONS:
 - DEFUNCT [0x8772978] (fd = 29)
 - DEFUNCT [0x86f9950] (fd = 28)
 - JOINED [0x875ae30] (origin = accept)
       (fd = 26, eof = 0, wq = 0, bp = 0)
       (proxyfd = 27, eof = 0, wq = 0, bp = 0)
 - JOINED [0x86fa970] (origin = accept)
       (fd = 24, eof = 0, wq = 0, bp = 0)
       (proxyfd = 25, eof = 0, wq = 0, bp = 0)
 - DEFUNCT [0x8733600] (fd = 21)
 - DEFUNCT [0x8732c38] (fd = 20)
 - JOINED [0x8733958] (origin = accept)
       (fd = 18, eof = 0, wq = 0, bp = 0)
       (proxyfd = 19, eof = 0, wq = 0, bp = 0)
 - JOINED [0x8734f78] (origin = accept)
       (fd = 16, eof = 0, wq = 0, bp = 0)
       (proxyfd = 17, eof = 0, wq = 0, bp = 0)
 - CONNECTED [0x87134a8] (fd = 15)
 - DEFUNCT [0x871f220] (fd = 13)
 - JOINED [0x87147d0] (origin = accept)
       (fd = 11, eof = 0, wq = 0, bp = 0)
       (proxyfd = 12, eof = 0, wq = 0, bp = 0)
 - JOINED [0x87083d0] (origin = connect)
       (fd = 9, eof = 0, wq = 0, bp = 0)
       (proxyfd = 10, eof = 0, wq = 0, bp = 0)
 - JOINED [0x86f29e8] (origin = connect)
       (fd = 7, eof = 0, wq = 0, bp = 0)
       (proxyfd = 8, eof = 0, wq = 0, bp = 0)
 - ACTIVE_LISTENING [0x86f2258] (fd = 6, acceptors = 1)
Before poll/select: 15 descriptors (total 29)
Error calling accept()
accept error (proxy_listensock): emfile

Has anyone else experienced such behavior?

Thanks

-justin

Gordon Guthrie

unread,
Aug 31, 2011, 10:35:50 AM8/31/11
to Justin Milam, erlang-q...@erlang.org
We get an intermittent ssl_esock problem which I have never successful reproduced. It goes to 100% and the process needs to be manually killed.

Richard Andrews also reported a problem with it going to 100% CPU in 2009:

He has a patch for that.

It is on my 'long list' of things to fix but more frequent/reproducable ones allways get in the way.

Gordon

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions




--
Gordon Guthrie
CEO hypernumbers

http://hypernumbers.com
t: hypernumbers
+44 7776 251669

pmac...@comcast.net

unread,
Aug 31, 2011, 11:08:56 AM8/31/11
to erlang-q...@erlang.org

Hi,

I have decided to take a hiatus from this list and have requested to be taken off.  I did so from the web site but it didn't seem to take.  Please remove me from this list. 

Thanks,

Peter MacGown

Ingela Andin

unread,
Sep 2, 2011, 3:09:42 PM9/2/11
to erlang-q...@erlang.org
Hi!

In R15 you will be able to run the Erlang distrubution over the new
ssl implementation. The plan is also to drop the old ssl
implementation in R15.

Regards Ingela Erlang/OTP-team, Ericsson AB


2011/8/31 Gordon Guthrie <gor...@hypernumbers.com>:

Ulf Wiger

unread,
Sep 3, 2011, 7:03:42 AM9/3/11
to Ingela Andin, erlang-q...@erlang.org

On 2 Sep 2011, at 21:09, Ingela Andin wrote:

Hi!

In R15 you will be able to run the Erlang distrubution over the new
ssl implementation.  The plan is also to drop the old ssl
implementation in R15.

Hmm, interesting…

Does this also mean that it will be easier to define other carriers (erlang-implemented) for the Erlang distribution?

BR,
Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.



Ingela Andin

unread,
Sep 5, 2011, 3:38:51 AM9/5/11
to Ulf Wiger, erlang-q...@erlang.org
Hi!

Well the main problem with Erlang-implemented carriers are that when
the distribution is started the system is in such an early
stage that you can not start Erlang applications. The way we solved
this for ssl is that we have a special supervisor that we hook
into the kernel application that will create a cloone of the ssl
application. The drawback with this is that you can not soft upgrade
the ssl-application
if you use ssl as distribution carrier (of course if you do not use
ssl for the distribution soft upgrade will be possible in R15).

Regards Ingela Erlang/OTP team - Ericssson AB


2011/9/3 Ulf Wiger <ulf....@erlang-solutions.com>:

Reply all
Reply to author
Forward
0 new messages