|socket leak||Jan Rychter||4/7/11 5:43 AM|
After upgrading Ring from 0.2.5 -> 0.3.7 we noticed our application
would crash after some time because of "too many open files".
Investigation showed that incoming connections are the culprit, from
jetty's QueuedThreadPool. Sockets are being left open.
I'm sorry I can't investigate this further right now, but there are
some urgent things that I really need to work on right now. In the
meantime, I wanted to give a heads-up and perhaps someone will be able
to guess the reason for the leak.
We use run-jetty to start our server. I know 0.2.5 works great and is
very stable, while 0.3.7 leaks. I have also narrowed it down to ring/
jetty, e.g. just changing the version in project.clj from 0.3.7 back
to 0.2.5 fixes the problem. It isn't our code, unless we're doing
something stupid that ring 0.3.7 exposed.
|Re: socket leak||James Reeves||4/7/11 3:24 PM|
On 7 April 2011 08:43, Jan Rychter <jryc...@gmail.com> wrote:
What state were the sockets being left in? TIME_WAIT?
I tried to replicate the error, and I noticed via netstat that a lot
Is that the problem you were seeing?
|Re: socket leak||Jan Rychter||4/8/11 2:57 AM|
On Apr 8, 12:24 am, James Reeves <jree...@weavejester.com> wrote:
> On 7 April 2011 08:43, Jan Rychter <jrych...@gmail.com> wrote:I checked now -- they are indeed in TIME_WAIT.
>It might be, but I am not sure. It's true I see lots of sockets in
But I got the "too many open files" error on our testing server, with
low connection load. We're talking perhaps one connection every few
and we ran out of descriptors after about two days of this. The time
do not really correspond to TIME_WAIT states.
A quick check shows that the problem seems to be under control on Mac
the app when bombarded with requests ends up with < 4000 sockets in
However, I don't think it is just the OS settings. Let's see:
ubuntu$ cat /proc/sys/net/ipv4/tcp_fin_timeout
we have a default TIME_WAIT period of 60s under Linux, and
Mac OS X has a default TCP MSL of 15s (you can't configure TIME_WAIT
directly in Mac OS X), so the TIME_WAIT period is at least 30s
I do not think the difference between 30s and 60s is *that*
ubuntu$ ulimit -n
So, I don't know -- it might be the same problem, or it might not.
something change between ring 0.2.5 and ring 0.3.7 for the problem to
suddenly appear? I am *really* sure we did not have this with 0.2.5,
been using that in production for many months how.
In case it helps, YourKit profiler shows these sockets as not closed,
there is no closing stack trace, only the opening one.
|Re: socket leak||James Reeves||4/8/11 4:48 AM|
On 8 April 2011 05:57, Jan Rychter <jryc...@gmail.com> wrote:
Version 0.2.5 used an earlier version of Jetty (6.1.14, rather than
When you downgraded to 0.2.5, did you just replace the
|Re: socket leak||Jan Rychter||4/9/11 3:40 AM|
On Apr 8, 1:48 pm, James Reeves <jree...@weavejester.com> wrote:
I only changed one line in project.clj, then cleaned and redownloaded
all dependencies. So jetty and ring-servlet were downgraded as well.
[sorry for the broken formatting in my previous post -- Google Groups
has a horrific web interface]
|Re: socket leak||Jan Rychter||9/26/11 2:36 AM|
I'll resurrect an old thread, since the issue still exists. I recently found some time to track this down. To recap the story so far: my ring application started to predictably crash under load with "Too many open files" after several hours when I switched from Ring 0.2.5 to 0.3.7.
I upgraded ring to 0.3.11 and confirmed the problem is there.
I then performed a binary search of jetty versions from 6.1.14 to 6.1.26, e.g. I just replace the two jetty libs, leaving the rest as-is:
6.1.26 - fails
6.1.25 - OK
6.1.23 - OK
6.1.20 - OK
6.1.14 - OK
The likely culprit is http://jira.codehaus.org/browse/JETTY-547 (Jetty should rely on socket.shutdownOutput() to close sockets).
The symptoms are reproducible after 1-3 hours on my Mac OS X system and after 8-12 hours on a Linux box. Investigating with YourKit shows that sockets are NOT being closed and some remain in open state until file descriptors are exhausted. Interestingly enough, it isn't all sockets that remain open, just some, in batches, it seems. This is why even on my Mac system (limited to 256 fds per process) it takes hours of stress testing to discover the problem.
Netstat shows that no sockets linger in TIME_WAIT.
I should probably raise this with jetty people, but I thought I'd post here, for those who have long-running applications (under heavier loads) using Ring. Just a heads-up — you might encounter this problem. In fact, I don't understand why more people don't complain about it.
|Re: socket leak||Jan Rychter||9/27/11 8:24 AM|
On Monday, September 26, 2011 11:36:00 AM UTC+2, Jan Rychter wrote:I'll resurrect an old thread, since the issue still exists. I recently found some time to track this down. To recap the story so far: my ring application started to predictably crash under load with "Too many open files" after several hours when I switched from Ring 0.2.5 to 0.3.7.
So I guess the question is — will ring move back to jetty-6.1.25, or should I fork it and build my own version? The current ring with 6.1.26 is unusable in our production environments because of the socket problem in jetty.
|Re: socket leak||Constantine Vetoshev||9/27/11 10:49 AM|
Have you considered using the latest Ring, excluding its Jetty
dependency, and adding your own? Leiningen supports this, e.g.:
:dependencies [[org.mortbay.jetty/jetty "6.1.25"]
Something similar should be possible with Maven also.
|Re: socket leak||Jan Rychter||9/27/11 11:00 AM|
On Tuesday, September 27, 2011 7:49:33 PM UTC+2, Constantine Vetoshev wrote:Have you considered using the latest Ring, excluding its Jetty
Nice! Thanks for this helpful advice -- I did this for jetty and jetty-util, and it is much easier than forking ring.
|Re: socket leak||James Reeves||9/27/11 2:38 PM|
I think I probably will, but in the meantime you can use Constantine's
|Re: socket leak||Jan Rychter||9/28/11 1:21 AM|
Constantine's solution works just fine for me. I just finished a nightly stress test, no problems found.
I filed a bug with the jetty people: [#JETTY-1438] Sockets are not getting closed (likely introduced in #JETTY-547 at http://jira.codehaus.org/browse/JETTY-1438 -- we will see if anything happens there. I am very surprised that more people don't encounter this problem.