Need to restart Jenkins once a week because of Too much open files

73 views
Skip to first unread message

Cees Bos

unread,
May 17, 2011, 3:46:09 AM5/17/11
to jenkins...@googlegroups.com
Hi all,

Once a week we have to restart the Jenkins master to fix or overcome an issue with Too much open files.
It is very annoying we have to do this restart and should not be required IMHO. For us this is unacceptable to continue like this.

How can we prevent this? Is this a configuration issue at our side? Or is this a bug in the Jenkins or one of the plugins?
I reported it as blocker, since this is not a workable situation, but till now (11 days later) I have not seen a single reaction.
Can anyone have a look and do some suggestions?

Regards,
Cees

Swindells, Thomas

unread,
May 17, 2011, 3:49:53 AM5/17/11
to jenkins...@googlegroups.com

You seem to have an awful lot of socket handles there, it may be worth attaching a dump showing what/where they are connecting to. It may be a plugin or something isn’t tidyup properly. I take it that your log files are normal?

 

Thomas




**************************************************************************************
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postm...@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary.

NDS Limited. Registered Office: One London Road, Staines, Middlesex, TW18 4EX, United Kingdom. A company registered in England and Wales. Registered no. 3080780. VAT no. GB 603 8808 40-00
**************************************************************************************

Cees Bos

unread,
May 17, 2011, 6:09:27 AM5/17/11
to jenkins...@googlegroups.com
Hi Thomas,

Thanks for your reply.

The logfile on the filesystem is empty:
[root@srv-nl-crd03 ~]# ll -h /var/log/jenkins
total 6.1M
-rw-r--r-- 1 jenkins jenkins    0 May 14 04:02 jenkins.log
-rw-r--r-- 1 jenkins jenkins 1.4M Apr 20 04:03 jenkins.log-20110420.gz
-rw-r--r-- 1 jenkins jenkins 1.3M Apr 26 04:03 jenkins.log-20110426.gz
-rw-r--r-- 1 jenkins jenkins 1.5M May  3 04:05 jenkins.log-20110503.gz
-rw-r--r-- 1 jenkins jenkins 1.8M May  7 04:04 jenkins.log-20110507.gz
-rw-r--r-- 1 jenkins jenkins 308K May 14 04:02 jenkins.log-20110514.gz

It looks like the logfile is written to disk and zipped once in a while.

When I check the logfile from Jenkins via /jenkins/log/rss?level=SEVERE I see 3 errors:

<?xml version="1.0" encoding="UTF-8"?>
  <feed xmlns="http://www.w3.org/2005/Atom"><title>Hudson log</title><link type="text/html" href="http://buildmaster-nl/jenkins/" rel="alternate"/><updated>2011-05-17T09:29:25Z</updated><author><name>Jenkins Server</name></author><id>urn:uuid:903deee0-7bfa-11db-9fe1-0800200c9a66</id><entry><title>I/O error in channel srv-nl-crd63</title><link type="text/html" href="http://buildmaster-nl/jenkins/log" rel="alternate"/><id>2795350</id><published>2011-05-17T09:29:25Z</published><updated>2011-05-17T09:29:25Z</updated><content>May 17, 2011 11:29:25 AM hudson.remoting.Channel$ReaderThread run
SEVERE: I/O error in channel srv-nl-crd63
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:185)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
	at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2558)
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2568)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:992)
</content></entry><entry><title>I/O error in channel CLI channel from /10.0.2.11</title><link type="text/html" href="http://buildmaster-nl/jenkins/log" rel="alternate"/><id>2760607</id><published>2011-05-17T08:56:24Z</published><updated>2011-05-17T08:56:24Z</updated><content>May 17, 2011 10:56:24 AM hudson.remoting.Channel$ReaderThread run
SEVERE: I/O error in channel CLI channel from /10.0.2.11
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:185)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
	at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2558)
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2568)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:992)
</content></entry><entry><title>I/O error in channel srv-nl-crd62</title><link type="text/html" href="http://buildmaster-nl/jenkins/log" rel="alternate"/><id>2625486</id><published>2011-05-17T06:42:47Z</published><updated>2011-05-17T06:42:47Z</updated><content>May 17, 2011 8:42:47 AM hudson.remoting.Channel$ReaderThread run
SEVERE: I/O error in channel srv-nl-crd62
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:185)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
	at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2558)
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2568)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:992)
</content></entry></feed>
We have 56 slave nodes attached.

Regards,
Cees

dreamtime

unread,
May 18, 2011, 7:45:30 AM5/18/11
to Jenkins Users
Sorry for my ignorance, but could you please explain where you get the
number of open files from?
Thank you!
We have to restart Jenkins several times a week because it gets really
slow and irresponsive but we haven't had the time to investigate the
cause. Maybe it's related.

On 17 Maj, 12:09, Cees Bos <cbos...@gmail.com> wrote:
> Hi Thomas,
>
> Thanks for your reply.
>
> The logfile on the filesystem is empty:
> [root@srv-nl-crd03 ~]# ll -h /var/log/jenkins
> total 6.1M
> -rw-r--r-- 1 jenkins jenkins    0 May 14 04:02 jenkins.log
> -rw-r--r-- 1 jenkins jenkins 1.4M Apr 20 04:03 jenkins.log-20110420.gz
> -rw-r--r-- 1 jenkins jenkins 1.3M Apr 26 04:03 jenkins.log-20110426.gz
> -rw-r--r-- 1 jenkins jenkins 1.5M May  3 04:05 jenkins.log-20110503.gz
> -rw-r--r-- 1 jenkins jenkins 1.8M May  7 04:04 jenkins.log-20110507.gz
> -rw-r--r-- 1 jenkins jenkins 308K May 14 04:02 jenkins.log-20110514.gz
>
> It looks like the logfile is written to disk and zipped once in a while.
>
> When I check the logfile from Jenkins via /jenkins/log/rss?level=SEVERE I
> see 3 errors:
>
> <?xml version="1.0" encoding="UTF-8"?>
>   <feed xmlns="http://www.w3.org/2005/Atom"><title>Hudson
> log</title><link type="text/html"
> href="http://buildmaster-nl/jenkins/"
> rel="alternate"/><updated>2011-05-17T09:29:25Z</updated><author><name>Jenki­ns
> Server</name></author><id>urn:uuid:903deee0-7bfa-11db-9fe1-0800200c9a66</id­><entry><title>I/O
> error in channel srv-nl-crd63</title><link type="text/html"
> href="http://buildmaster-nl/jenkins/log"
> rel="alternate"/><id>2795350</id><published>2011-05-17T09:29:25Z</published­><updated>2011-05-17T09:29:25Z</updated><content>May
> 17, 2011 11:29:25 AM hudson.remoting.Channel$ReaderThread run
> SEVERE: I/O error in channel srv-nl-crd63
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:185)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
>         at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:­2558)
>         at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.j­ava:2568)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:992)
> </content></entry><entry><title>I/O error in channel CLI channel from
> /10.0.2.11</title><link type="text/html"
> href="http://buildmaster-nl/jenkins/log"
> rel="alternate"/><id>2760607</id><published>2011-05-17T08:56:24Z</published­><updated>2011-05-17T08:56:24Z</updated><content>May
> 17, 2011 10:56:24 AM hudson.remoting.Channel$ReaderThread run
> SEVERE: I/O error in channel CLI channel from /10.0.2.11
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:185)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
>         at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:­2558)
>         at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.j­ava:2568)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:992)
> </content></entry><entry><title>I/O error in channel
> srv-nl-crd62</title><link type="text/html"
> href="http://buildmaster-nl/jenkins/log"
> rel="alternate"/><id>2625486</id><published>2011-05-17T06:42:47Z</published­><updated>2011-05-17T06:42:47Z</updated><content>May
> 17, 2011 8:42:47 AM hudson.remoting.Channel$ReaderThread run
> SEVERE: I/O error in channel srv-nl-crd62
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:185)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
>         at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:­2558)
>         at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.j­ava:2568)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:992)
> </content></entry></feed>
>
> We have 56 slave nodes attached.
>
> Regards,
> Cees
>
> On Tue, May 17, 2011 at 9:49 AM, Swindells, Thomas <TSwinde...@nds.com>wrote:
>
>
>
> >  You seem to have an awful lot of socket handles there, it may be worth
> > attaching a dump showing what/where they are connecting to. It may be a
> > plugin or something isn’t tidyup properly. I take it that your log files are
> > normal?
>
> > Thomas
>
> > *From:* jenkins...@googlegroups.com [mailto:
> > jenkins...@googlegroups.com] *On Behalf Of *Cees Bos
> > *Sent:* 17 May 2011 08:46
> > *To:* jenkins...@googlegroups.com
> > *Subject:* Need to restart Jenkins once a week because of Too much open
> > files
>
> > Hi all,
>
> > Once a week we have to restart the Jenkins master to fix or overcome an
> > issue with Too much open files.
>
> > It is very annoying we have to do this restart and should not be required
> > IMHO. For us this is unacceptable to continue like this.
>
> > I logged a issue for that:
> >https://issues.jenkins-ci.org/browse/JENKINS-9609
>
> > How can we prevent this? Is this a configuration issue at our side? Or is
> > this a bug in the Jenkins or one of the plugins?
>
> > I reported it as blocker, since this is not a workable situation, but till
> > now (11 days later) I have not seen a single reaction.
>
> > Can anyone have a look and do some suggestions?
>
> > Regards,
> > Cees
>
> > ------------------------------
>
> > ***************************************************************************­***********
> > This message is confidential and intended only for the addressee. If you
> > have received this message in error, please immediately notify the
> > postmas...@nds.com and delete it from your system as well as any copies.
> > The content of e-mails as well as traffic data may be monitored by NDS for
> > employment and security purposes. To protect the environment please do not
> > print this e-mail unless necessary.
>
> > NDS Limited. Registered Office: One London Road, Staines, Middlesex, TW18
> > 4EX, United Kingdom. A company registered in England and Wales. Registered
> > no. 3080780. VAT no. GB 603 8808 40-00
>
> > ***************************************************************************­***********

Hilco Wijbenga

unread,
May 18, 2011, 12:34:17 PM5/18/11
to jenkins...@googlegroups.com
On 18 May 2011 04:45, dreamtime <angela.j...@gmail.com> wrote:
> Sorry for my ignorance, but could you please explain where you get the
> number of open files from?

It's in the logs.

> We have to restart Jenkins several times a week because it gets really
> slow and irresponsive but we haven't had the time to investigate the
> cause. Maybe it's related.

For me it happens after reloading the configuration files (Reload
Configuration from Disk) a few times.

R. Tyler Croy

unread,
May 18, 2011, 12:41:02 PM5/18/11
to jenkins...@googlegroups.com

On Wed, 18 May 2011, Hilco Wijbenga wrote:

> On 18 May 2011 04:45, dreamtime <angela.j...@gmail.com> wrote:
> > Sorry for my ignorance, but could you please explain where you get the
> > number of open files from?
>
> It's in the logs.

If you're running the master on Linux, you can also check in /proc/<processid>/fd


- R. Tyler Croy
--------------------------------------
Code: http://github.com/rtyler
Chatter: http://identi.ca/agentdero
http://twitter.com/agentdero

Les Mikesell

unread,
May 18, 2011, 1:01:11 PM5/18/11
to jenkins...@googlegroups.com
On 5/18/2011 11:41 AM, R. Tyler Croy wrote:
>
> On Wed, 18 May 2011, Hilco Wijbenga wrote:
>
>> On 18 May 2011 04:45, dreamtime<angela.j...@gmail.com> wrote:
>>> Sorry for my ignorance, but could you please explain where you get the
>>> number of open files from?
>>
>> It's in the logs.
>
> If you're running the master on Linux, you can also check in /proc/<processid>/fd

Linux tends to have a low limit on open fd's per non-root user. Is
there some equivalent way to check the current number for a user id
instead of a single process?

--
Les Mikesell
lesmi...@gmail.com

R. Tyler Croy

unread,
May 18, 2011, 1:08:38 PM5/18/11
to jenkins...@googlegroups.com

If you're logged in as that user, or running a script as that user, execute:
`ulimit -n`

The default for me is 1024 on this openSUSE/amd64 machine

Les Mikesell

unread,
May 18, 2011, 1:13:40 PM5/18/11
to jenkins...@googlegroups.com
On 5/18/2011 12:08 PM, R. Tyler Croy wrote:
>
> On Wed, 18 May 2011, Les Mikesell wrote:
>
>> On 5/18/2011 11:41 AM, R. Tyler Croy wrote:
>>>
>>> On Wed, 18 May 2011, Hilco Wijbenga wrote:
>>>
>>>> On 18 May 2011 04:45, dreamtime<angela.j...@gmail.com> wrote:
>>>>> Sorry for my ignorance, but could you please explain where you get the
>>>>> number of open files from?
>>>>
>>>> It's in the logs.
>>>
>>> If you're running the master on Linux, you can also check in /proc/<processid>/fd
>>
>> Linux tends to have a low limit on open fd's per non-root user. Is
>> there some equivalent way to check the current number for a user id
>> instead of a single process?
>
> If you're logged in as that user, or running a script as that user, execute:
> `ulimit -n`
>
> The default for me is 1024 on this openSUSE/amd64 machine

That's the current limit. How do you find the number currently open
that relate to that user's limit?

--
Les Mikesell
lesmi...@gmail.com

Mirko Friedenhagen

unread,
May 18, 2011, 3:28:07 PM5/18/11
to jenkins...@googlegroups.com

The limit is per process.Find the pid for your jenkins process (ps or
top might help) and execute:

lsof -p PID | wc -l

root@XXXX:/usr/share/jenkins# lsof -p 8744 | wc -l
715

For ubuntu (or debian) there is a variable in /etc/default/jenkins

# OS LIMITS SETUP
# comment this out to observe /etc/security/limits.conf
# this is on by default because
http://github.com/feniix/jenkins/commit/d13c08ea8f5a3fa730ba174305e6429b74853927
# reported that Ubuntu's PAM configuration doesn't include
pam_limits.so, and as a result the # of file
# descriptors are forced to 1024 regardless of /etc/security/limits.conf
MAXOPENFILES=8192

Regards
Mirko
--
http://illegalstateexception.blogspot.com/
https://github.com/mfriedenhagen/
https://bitbucket.org/mfriedenhagen/

Tim Pizey

unread,
May 18, 2011, 4:16:29 PM5/18/11
to jenkins...@googlegroups.com

We had Hudson dying in the same fashion (too many files open), on
RedHat (RHEL4).

Our SCM was subversion (1.4) also I think we had a version mis-match
between SVN server and client.
The file handles were being left open by SVN.

By upgrading to jenkins/ubuntu/subversion 1.6 the problem has gone away.

Hope this helps
Tim


--
Tim Pizey - http://pizey.net/~timp
Centre for Genomics and Global Health - http://cggh.org

danny staple

unread,
May 18, 2011, 5:40:59 PM5/18/11
to jenkins...@googlegroups.com
We have found that our master gets bogged down due to the CheckURL stuff done on the archive artifacts box - this spawns threads indexing stuff which stay open, keeping open files, and tying up the CPU. I used a greasemonkey script in firefox to knock it off for our users. This would only really be a problem if you have many files in the workspace - we have a build that has something in the realm of 230k of files when not cleaned up. Pressing the configure button in that state will cause the checkurl to fire off, even if I never change the content of that box - there are file globs in it, which also makes it worse.

I've popped both the greasemonkey script, plus a shell script to scrape and find threads that were left by this.


Thanks,
Danny
--
Danny Staple

Director, ODM Solutions Ltd
w: http://www.odmsolutions.co.uk
Blog: http://orionrobots.co.uk/blog1-Danny-Staple


Cees Bos

unread,
May 24, 2011, 5:38:53 AM5/24/11
to jenkins...@googlegroups.com
Hi all,

I have created a heap dump. In the heapdump I see there are 1200+ java.net.SocksSocketImpl instances available.
Before taking the heapdump I have executed GC several times.

I have stored the heapdump in a github repos:

Based on the heapdump it should be possible to identify where the open sockets are coming from, right?

Regards,
Cees

Cees Bos

unread,
May 26, 2011, 3:57:55 PM5/26/11
to jenkins...@googlegroups.com, jenkin...@googlegroups.com
Hi all,

Is there more information needed to analyze this issue? 
I have to restart the Jenkins now again to prevent that server from blocking all activities.

Regards,
Cees

David Harkness

unread,
May 27, 2011, 2:26:39 PM5/27/11
to jenkins...@googlegroups.com
On Thu, May 26, 2011 at 12:57 PM, Cees Bos <cbos.ec@gmail.com> wrote:
Is there more information needed to analyze this issue? 
I have to restart the Jenkins now again to prevent that server from blocking all activities.

Is this the reason that after a while neither I nor Jenkins can access SVN from the same machine? Any operation against SVN times out after a long delay, but iostat doesn't show anything awry AFAICT.

David

Reply all
Reply to author
Forward
0 new messages