File descriptor leak. Very hard to reproduce in development.

420 views
Skip to first unread message

poiuytrez

unread,
Dec 30, 2015, 11:37:56 AM12/30/15
to vert.x
Hello, 

I developed a full project in vert.x and I noticed a file descriptor leak in production. I have a 50 files descriptor leak per minute approximately. I have a pretty high load from clients (mobile devices). I reduced my code to a simple http server in one verticle which replies only "ok" and request.response().close();. When I put this code in production (with real requests), I have a file descriptors leak. 

To reproduce the issue: 
1. Run the basic vert.x web server on a remote server (linux debian 8, openjdk)
2. Check the pid of the java app using
ps -aux |grep java
3. Print the number of open file descriptors (replace 5913)
ls -la /proc/5913/fd | grep socket | wc -l
4. Use a flaky jmeter on a laptop (jmeter -n -t leak.jmx). I used a macbook air. Configure it with more than 3000 threads doing requests infinitely. It works better when the request body is large.
5. After some time, the number of file descriptors will increase and your jmeter will hang with an out of memory. When you kill your jmeter, you will have many file descriptors remaining open.
6. lsof -a -p 5913 will show many ESTABLISHED tcp connection even if no clients are active.
The issue will not appear when I use a jmeter on a robust machine with many cpu and a lot of ram and in the same remote environment !  

I think that the main issue is that vert.x do not close the connection of a client if it dies during the request. My clients (mobile devices) might have flaky connections or lost their connections when the user lost its 3G/4G network. It might explain why we have the issue on this vert.x application.

What would you recommend?

Thank you for your help,
poiuytrez

poiuytrez

unread,
Dec 30, 2015, 11:39:03 AM12/30/15
to vert.x
Attached file.
leak.jmx

poiuytrez

unread,
Dec 30, 2015, 11:44:48 AM12/30/15
to vert.x
We are using Vert.x 3.2.

Sorry for the multiple emails.

Nat

unread,
Dec 30, 2015, 11:59:43 AM12/30/15
to vert.x
Can you provide your code as well as a heap dump as well?

poiuytrez

unread,
Dec 30, 2015, 12:13:55 PM12/30/15
to vert.x
I have attached the launcher and the verticle to this message. Let me know if you need a full project.
HttpServerVerticle.java
Launcher.java

Nat

unread,
Dec 30, 2015, 12:21:21 PM12/30/15
to vert.x
Can you try to add HttpServerOptions.setIdleTimeout(1) to see whether it helps eliminating the file descriptor leak?

poiuytrez

unread,
Dec 30, 2015, 12:32:50 PM12/30/15
to vert.x
I have just tried. I still get the issue with the setIdleTimeout option. 

Jez P

unread,
Dec 30, 2015, 1:53:03 PM12/30/15
to vert.x
Are you calling Vertx.getVertx() more than once?

Guillaume Charhon

unread,
Dec 30, 2015, 2:14:12 PM12/30/15
to ve...@googlegroups.com

No. You can check the code attached in one of my previous message.

--
You received this message because you are subscribed to a topic in the Google Groups "vert.x" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vertx/MXmy8obIr94/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/d26b2839-3c5f-4404-9577-4753bc2a041a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nat

unread,
Dec 30, 2015, 2:19:47 PM12/30/15
to vert.x
from the heapdump, it looks like the client side connected to the server but did not send any request over. HttpServerOptions.setIdleTimeout(1) should have forced the connection to be closed when the connection has been idled. Can you send the heapdump again with setIdleTimeout turned on?

Julien Viet

unread,
Dec 31, 2015, 4:04:54 AM12/31/15
to ve...@googlegroups.com
idle timeout will force connection to be closed from server side, however it can take some time before the connection is really closed if the client is not responding and closing the connection properly from its side and on the server the connection can be in FIN_WAIT_* state for a while.

@Pouitrez : can you tell us in which state the leaked connection are with the idle timeout ?

You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

poiuytrez

unread,
Dec 31, 2015, 12:47:36 PM12/31/15
to vert.x
After some additional tests, it seems that the setIdleTimout option works ! I believe I did not make the test correctly yesterday...
Reply all
Reply to author
Forward
0 new messages