Hi experts! :-)
To my novice's eye it seems that the code that keeps the build queue and
the executors in sync is leaking TIME_WAIT connection on the server side.
Analysis
--------
### Setup
My Jenkins is running at port 8082 on a Windows machine. I have
installed it with the default installer and it's running as a Windows
service.
### refreshPart x 2
The Jenkins sidebar contains the following code:
<table id="buildQueue" class="pane">...
<script defer="defer">
refreshPart('buildQueue',"/jenkins/ajaxBuildQueue");
</script>
<table id="executors" class="pane">...
<script defer="defer">
refreshPart('executors',"/jenkins/ajaxExecutors");
</script>
where refreshPart is defined in hudson-behaviour.js and reload the
buildQueue and the executors info every 5 seconds.
### Reloads
This means my FireFox will make two connections every 5 seconds to the
server.
Looking at the open TCP-Ports at the server, I see that after I newly
opened a Joblist View in FireFox, the number of connections
{server:8082} <-> {mypc:####} in TIME_WAIT state goes up by approx. 2
per 5 seconds, settling down at permanently 52(!) connections in
TIME_WAIT state after a few minutes.
This means that every browser page that shows the Jenkins sidebar will
eventually take up about 50 ports on the server side.
#### TIME_WAIT
I monitor these TIME_WAIT connections with the TCPView tool from
SysInternals, but I guess I could also read this via netstat.
As far as I understand, they are leftovers of the (very short)
connections done by the client when the server does an [Active
Close](http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html)
after the client has been served.
### 503 errors with Apache proxy
I now have an Apache instance running on this server that is serving
this Jenkins instance at port 80 via local proxying
{mypc} <-> {server(Apache):80} <-> {server(Jenkins):8082}
When I do access Jenkins via the proxy, the "leaking" TIME_WAIT
connections will be between (only) Apache:80 and Jenkins:8082 on the
localhost.
It now does appear, that when the number of TIME_WAIT connections
reaches about 1000 (that would be 1000/50==20 open windows for a few
minutes) Apache cannot open a local port to open a proxy connection
anymore and subsequently goes into 503-service-temporarily-unavailable
### Timeout with direct connects on port:8082
When accessing server:8082/jenkins directly with a lot of open FF pages
(from my dev machine) The Jenkins instance doesn't react anymore when
the number of TIME_WAIT connections reaches about 600-700.
The server only start to respond again after a few minutes (FF will wait
for it, so it apparently isn't that long).
Question
--------
* Does my Analysis make sense?
* What to do about this problem?
Clearly neither the 503 error nor the minute-long delay is acceptable if
this can happen as soon as a bunch of devs has 2 or 3 windows open.
thanks a lot,
Martin
On 25.07.2011 14:43, Martin B. wrote:
> On 25.07.2011 10:37, Martin B. wrote:
>> On 22.07.2011 16:15, Martin B. wrote:
>>> Hi!
>>>
>>> I'm running Jenkins (on port 8082) behind an apache proxy (on 80)
>>>
>>> from time to time I get the message (when acessing through apache):
>>> 503 Service Temporarily Unavailable error
>>>
>>> The apache logs show:
>>> [Fri Jul 22 16:08:58 2011]
>>> [error] (OS 10048)Normalerweise darf jede Socketadresse (Protokoll,
>>> Netzwerkadresse oder Anschluss) nur jeweils einmal verwendet werden. :
>>> proxy: HTTP: attempt to connect to 127.0.0.1:8082 (localhost) failed
>>> [Fri Jul 22 16:08:58 2011]
>>> [error] ap_proxy_connect_backend disabling worker for (localhost)
>>> [Fri Jul 22 16:09:03 2011]
>>> [error] proxy: HTTP: disabled connection for (localhost)
>>> ...
>>>
>>>
>>> Does this mean I have some problem with the setup (something else on
>>> port 8082 ??) or is this something else?
>>>
>>
>> Hmmm ...
>>
>> -> http://stackoverflow.com/questions/163603/apache-sockets-not-closing
>> and
>> ->
>> https://wiki.jenkins-ci.org/display/JENKINS/Running+Jenkins+behind+Apache
>>
>> I'll try it with the proxy-nokeepalive option and see if that'll help.
>>
>
> Pfff .. it does appear proxy-nokeepalive actually worsens the situation.
>
> I'll have to dig around further it seems ...
>
> - Martin
>
It seems it is not directly related to the "open" TIME_WAIT connections
after all.
I just had this occuring (using the server via Apache proxy) and the
TIME_WAIT connections on the server, which I am monitoring atm., was
only at about 30.
Any input welcome. I'm really lost.
cheers,
Martin
I'm now trying with what I found here:
http://forums.devshed.com/apache-development-15/os-10048-error-406218.html
+> After a recent upgrade from Windows Server
+> 2000 to Windows Server 2003, I'm getting
+> intermittent 502 errors ... using a reverse
+> proxy (ProxyPass and ProxyPassReverse) to
+> connect to a J2EE web container on the same machine.
+> ...
+> As it turns out, this was because we were running
+> out of local ports on the server. To fix, we
+> increased the max "local port" from
+> 5000 to 10000.
I have now upped my
[MaxUserPort](http://support.microsoft.com/kb/196271) setting on this
Win2003 box to 15000.
Although I am very sure I was *not* running out of local ports -- there
was no way we were near the avaliable ~4000 (1024 - 5000) ports used up
as far as I could tell -- I guess it's worth a try.
cheers,
Martin