| socket leak | Jan Rychter | 07/04/11 05:43 | After upgrading Ring from 0.2.5 -> 0.3.7 we noticed our application
would crash after some time because of "too many open files". Investigation showed that incoming connections are the culprit, from jetty's QueuedThreadPool. Sockets are being left open. I'm sorry I can't investigate this further right now, but there are some urgent things that I really need to work on right now. In the meantime, I wanted to give a heads-up and perhaps someone will be able to guess the reason for the leak. We use run-jetty to start our server. I know 0.2.5 works great and is very stable, while 0.3.7 leaks. I have also narrowed it down to ring/ jetty, e.g. just changing the version in project.clj from 0.3.7 back to 0.2.5 fixes the problem. It isn't our code, unless we're doing something stupid that ring 0.3.7 exposed. |
| Re: socket leak | James Reeves | 07/04/11 15:24 | On 7 April 2011 08:43, Jan Rychter <jryc...@gmail.com> wrote: What state were the sockets being left in? TIME_WAIT? I tried to replicate the error, and I noticed via netstat that a lot Is that the problem you were seeing? - James |
| Re: socket leak | Jan Rychter | 08/04/11 02:57 | On Apr 8, 12:24 am, James Reeves <jree...@weavejester.com> wrote:
> On 7 April 2011 08:43, Jan Rychter <jrych...@gmail.com> wrote:I checked now -- they are indeed in TIME_WAIT. > http://jira.codehaus.org/browse/JETTY-999?focusedCommentId=191250&pag... >It might be, but I am not sure. It's true I see lots of sockets in TIME_WAIT. But I got the "too many open files" error on our testing server, with a very low connection load. We're talking perhaps one connection every few seconds, and we ran out of descriptors after about two days of this. The time scales do not really correspond to TIME_WAIT states. A quick check shows that the problem seems to be under control on Mac OS X: the app when bombarded with requests ends up with < 4000 sockets in TIME_WAIT. However, I don't think it is just the OS settings. Let's see: ubuntu$ cat /proc/sys/net/ipv4/tcp_fin_timeout 60 we have a default TIME_WAIT period of 60s under Linux, and bongo:/Main/jwr>sysctl net.inet.tcp.msl net.inet.tcp.msl: 15000 Mac OS X has a default TCP MSL of 15s (you can't configure TIME_WAIT period directly in Mac OS X), so the TIME_WAIT period is at least 30s (2*MSL). I do not think the difference between 30s and 60s is *that* significant, especially that: ubuntu$ ulimit -n 1024 bongo:/Main/jwr>ulimit -n 256 So, I don't know -- it might be the same problem, or it might not. Would something change between ring 0.2.5 and ring 0.3.7 for the problem to suddenly appear? I am *really* sure we did not have this with 0.2.5, we've been using that in production for many months how. In case it helps, YourKit profiler shows these sockets as not closed, e.g. there is no closing stack trace, only the opening one. --J. |
| Re: socket leak | James Reeves | 08/04/11 04:48 | On 8 April 2011 05:57, Jan Rychter <jryc...@gmail.com> wrote: Version 0.2.5 used an earlier version of Jetty (6.1.14, rather than When you downgraded to 0.2.5, did you just replace the - James |
| Re: socket leak | Jan Rychter | 09/04/11 03:40 | On Apr 8, 1:48 pm, James Reeves <jree...@weavejester.com> wrote:
I only changed one line in project.clj, then cleaned and redownloaded all dependencies. So jetty and ring-servlet were downgraded as well. [sorry for the broken formatting in my previous post -- Google Groups has a horrific web interface] --J. |
| Re: socket leak | Jan Rychter | 26/09/11 02:36 | I'll resurrect an old thread, since the issue still exists. I recently found some time to track this down. To recap the story so far: my ring application started to predictably crash under load with "Too many open files" after several hours when I switched from Ring 0.2.5 to 0.3.7. I upgraded ring to 0.3.11 and confirmed the problem is there. I then performed a binary search of jetty versions from 6.1.14 to 6.1.26, e.g. I just replace the two jetty libs, leaving the rest as-is: 6.1.26 - fails 6.1.25 - OK 6.1.23 - OK 6.1.20 - OK 6.1.14 - OK The likely culprit is http://jira.codehaus.org/browse/JETTY-547 (Jetty should rely on socket.shutdownOutput() to close sockets). The symptoms are reproducible after 1-3 hours on my Mac OS X system and after 8-12 hours on a Linux box. Investigating with YourKit shows that sockets are NOT being closed and some remain in open state until file descriptors are exhausted. Interestingly enough, it isn't all sockets that remain open, just some, in batches, it seems. This is why even on my Mac system (limited to 256 fds per process) it takes hours of stress testing to discover the problem. Netstat shows that no sockets linger in TIME_WAIT. I should probably raise this with jetty people, but I thought I'd post here, for those who have long-running applications (under heavier loads) using Ring. Just a heads-up — you might encounter this problem. In fact, I don't understand why more people don't complain about it. --J. |
| Re: socket leak | Jan Rychter | 27/09/11 08:24 | On Monday, September 26, 2011 11:36:00 AM UTC+2, Jan Rychter wrote:I'll resurrect an old thread, since the issue still exists. I recently found some time to track this down. To recap the story so far: my ring application started to predictably crash under load with "Too many open files" after several hours when I switched from Ring 0.2.5 to 0.3.7. So I guess the question is — will ring move back to jetty-6.1.25, or should I fork it and build my own version? The current ring with 6.1.26 is unusable in our production environments because of the socket problem in jetty. --J. |
| Re: socket leak | Constantine Vetoshev | 27/09/11 10:49 | Have you considered using the latest Ring, excluding its Jetty dependency, and adding your own? Leiningen supports this, e.g.: :dependencies [[org.mortbay.jetty/jetty "6.1.25"] Something similar should be possible with Maven also. |
| Re: socket leak | Jan Rychter | 27/09/11 11:00 | On Tuesday, September 27, 2011 7:49:33 PM UTC+2, Constantine Vetoshev wrote:Have you considered using the latest Ring, excluding its Jetty Nice! Thanks for this helpful advice -- I did this for jetty and jetty-util, and it is much easier than forking ring. --J. |
| Re: socket leak | James Reeves | 27/09/11 14:38 | I think I probably will, but in the meantime you can use Constantine's - James |
| Re: socket leak | Jan Rychter | 28/09/11 01:21 | Constantine's solution works just fine for me. I just finished a nightly stress test, no problems found. I filed a bug with the jetty people: [#JETTY-1438] Sockets are not getting closed (likely introduced in #JETTY-547 at http://jira.codehaus.org/browse/JETTY-1438 -- we will see if anything happens there. I am very surprised that more people don't encounter this problem. thanks, --J. |