Open file handles becoming an issue

83 views
Skip to first unread message

Sean Last

unread,
Jan 7, 2015, 2:05:33 PM1/7/15
to jenkins...@googlegroups.com
I run a fairly large jenkins install (300 jobs and counting), but I've begun to have serious issues with it failing every couple of weeks because there are too many open files.

I have not yet increased the open files available to the jenkins user because I want to see if there is something leaking first.

The last time this happened, installed the "file handles" plugin.  The Jenkins application only recognizes 44 open file handles. (see jenkins_file-handles.txt)

However, the jenkins user has 1700 open files, most of them being regular files (see open_files.txt).

I"m not sure how to go about troubleshooting this from here.  Any help would be vastly appreciated.


open_files.txt
jenkins_file-handles.txt

Les Mikesell

unread,
Jan 7, 2015, 2:29:37 PM1/7/15
to jenkinsci-users
I'd guess that the plugin just shows things java has open at the
application level, not the internal share libs etc. In any case you
can tune up the system wide and per-user limits imposed by the OS to
avoid the problem, but perhaps a better solution to scaling issues
would be to add slaves to do the build work if you haven't already.

--
Les Mikesell
lesmi...@gmail.com

Sean Last

unread,
Jan 7, 2015, 2:55:11 PM1/7/15
to jenkins...@googlegroups.com
Yeah, we've got plenty of slaves, and in fact, all the builds MUST happen on slaves, not on the master.  So something must be happening with the master application that is leaving files open.

Sean Last

unread,
Jan 7, 2015, 2:55:58 PM1/7/15
to jenkins...@googlegroups.com
And I know I could just up the open files limit for the jenkins user, but I'd really like to know why this is happening so it doesn't just keep growing until it's full regardless of where I put the limit.

Les Mikesell

unread,
Jan 7, 2015, 3:06:39 PM1/7/15
to jenkinsci-users
On Wed, Jan 7, 2015 at 1:55 PM, Sean Last <qkt...@gmail.com> wrote:
> And I know I could just up the open files limit for the jenkins user, but
> I'd really like to know why this is happening so it doesn't just keep
> growing until it's full regardless of where I put the limit.

Off the top of my head, a bug in the JVM you are using sounds likely.
Have you tried different versions or checked its issues? And does a
jenkins restart drop the number significantly compared to after
running a long time?

--
Les Mikesell
lesmi...@gmail.com

Sean Last

unread,
Jan 7, 2015, 3:21:51 PM1/7/15
to jenkins...@googlegroups.com
Yes, restarting jenkins completely clears the open files for the jenkins user, even though the jenkins application is unaware of the open files.

James Nord

unread,
Jan 7, 2015, 5:00:55 PM1/7/15
to jenkins...@googlegroups.com, Sean Last
The default max file handles in most Linux installs (1024) for Jenkins is woefully inadequate.

Java itself will have many open files to libs and jars, jenkins will then have the for jobs, users and slaves.

I recall also that the lib used by the fd plugin doesn't count all for descriptors and I think I submitted a patch. It certainly will only get descriptors opened after it is hooked up which is after java itself has got may handles which can give you a significant difference.

I would up the limit and then run a periodic check on the file handles to check that there are no leaks over time.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Sean Last

unread,
Jan 7, 2015, 5:05:53 PM1/7/15
to jenkins...@googlegroups.com, qkt...@gmail.com
Ok, i'll up the limit, but is there any metric I can use for what's reasonable versus what's worrisome in an average case/per job?

James Nord

unread,
Jan 8, 2015, 4:00:36 AM1/8/15
to jenkins...@googlegroups.com
its not really per job - connections to the http(s) interface also use file descriptors so you need to factor in how many clients you will have connecting and if they use persistent connections and the slaves and type of slaves, how many plugins you have installed - which version of the JDK you have and how its tuned (storing the compiled machine code).  The upper burst limit may also depend on what your job does and what type it is - e.g. archiving files will create some FDs whilst the file is being copied across (they should be closed afterwards).

I used 10k for several thousand jobs with a few hundred users.  I was monitoring it for a while (several years ago) and it did stabalize - I can't recall the exact number.

That's not to say there isn't a leak somewhere - but if you up the limit and track it with 'lsof' over a period of days/weeks I think you may see that it is relativley stable, if not you should at least see a common trend of something that is constantly increasing. (a combination of awk sort and uniq should be able to show any upward trends - if you have an executor on the same OS as the master you could also do this through jenkins and use the plot plugin to visualize :-)

/James

Sean Last

unread,
Jan 8, 2015, 10:56:37 AM1/8/15
to jenkins...@googlegroups.com
Awesome, thanks so much.  That's very good information.  I really appreciate the response!  Thanks everyone!
Reply all
Reply to author
Forward
0 new messages