Jenkins using a lot more resources after upgrade

227 views
Skip to first unread message

Ugo Bellavance

unread,
May 11, 2016, 11:43:44 AM5/11/16
to Jenkins Users
Hi,

After some testing we upgraded Jenkins in production, from 1.617-1.1 to 1.656-1.1 on April 20. However, since the upgrade we're running into problems that we never saw before and it's now kind of out of control:

  • We had to raise the nofile parameter in limits.conf for the jenkins user because we were getting "Too many open files" errors. We raised it to 8192 soft, 10240 hard on April 28th. By looking into /proc/pid/ld/ I found out that most of these files where inet sockets. Got these errors again on May 8th and 9th. I didn't change anything yet for that issue
  • We had to raise the java heap memory. It was at -Xmx4096m, we raised it to -Xmx7168m, then to -Xmx10752m and adjusted the vRAM allocation for this VM accordingly, but it doesn't solve the problem, we're still getting "java.lang.OutOfMemoryError: Java heap space" errors.  When that starts happening, we have to restart jenkins because the web interface is not responding anymore (which makes it difficult to troubleshoot because we can't really tell what is running at this moment.
Note that we do run Selenium testing through IC, but nothing has changed drastically after the upgrade. RHEL updates were applied a few days before (Apr 6th) and that include a minor update to Firefox (38.0 => 38.7). We're using openjdk java JRE. Was updated to 1.7.0.99 (from 1.7.0.79) on April 6th, 1.7.0.101 on May 2nd.

Has anyone experienced something similar?  What are my best options? Our workaround now is to restart jenkins manually when needed.

  • Would it be possible to rollback to 1.617 without breaking anything?
  • Try the LTS version (that would mean a downgrade - would it break stuff?)
  • Jump to version 2 (2.2-1.1 available)
  • Inspect the JVM's memory
  • Any other idea is welcome
Thanks in advance,

Ugo

John Mellor

unread,
May 11, 2016, 12:32:59 PM5/11/16
to jenkins...@googlegroups.com

Yeah, I’m seeing that too, and I am also running Jenkins 1.656 version.  I’m only running about 400 jobs in Jenkins with 11 slaves, so it’s definitely not the busiest build environment out there.  I’ve currently bumped the heap up to 5GB, and still do not have anywhere near enough to run the backup plugin for example.  I’m thinking of doubling it to see if that helps, but your experience seems to say that this is also not going to be enough.

 

This problem is also scaring me about whether unreasonable resource utilization has been corrected in a near-future upgrade to Jenkins 2.2…

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/4e6f40ca-cbf7-41b4-874e-18df1c04f9ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ugo Bellavance

unread,
May 11, 2016, 1:18:33 PM5/11/16
to Jenkins Users, John....@esentire.com


On Wednesday, May 11, 2016 at 12:32:59 PM UTC-4, John Mellor wrote:

Yeah, I’m seeing that too, and I am also running Jenkins 1.656 version.  I’m only running about 400 jobs in Jenkins with 11 slaves, so it’s definitely not the busiest build environment out there.  I’ve currently bumped the heap up to 5GB, and still do not have anywhere near enough to run the backup plugin for example.  I’m thinking of doubling it to see if that helps, but your experience seems to say that this is also not going to be enough.


I could try increasing again, but it doesn't make much sense.
 

 

This problem is also scaring me about whether unreasonable resource utilization has been corrected in a near-future upgrade to Jenkins 2.2…


It's partly comforting, but also scary to see that we're not alone with this problem.  Yes, it would be interesting to know if it has been fixed in version 2, and it would also be nice to know when this problem was introduced and if we can downgrade to a know working version.  Should we open a bug in jenkins's JIRA? Have you seen https://issues.jenkins-ci.org/browse/JENKINS-34573? Have you generated a heap dump? I'm not too familiar with that (yet).

Ugo

Stephen Connolly

unread,
May 12, 2016, 4:31:37 AM5/12/16
to jenkins...@googlegroups.com, John....@esentire.com
I am suspecting JENKINS-34213 may be your issue.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.

Ugo Bellavance

unread,
May 12, 2016, 10:49:02 AM5/12/16
to Jenkins Users, John....@esentire.com


On Thursday, May 12, 2016 at 4:31:37 AM UTC-4, Stephen Connolly wrote:
I am suspecting JENKINS-34213 may be your issue.

I have new information.  We installed the monitoring plugin and I started using VisualVM.  I can see that, while I use -Xmx10752m, it only uses a little less than 8 GB.  Is there a problem with my configuration? I also got a heap dump.

JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true -Dcom.sun.management.jmxremote -Xmx10752m"

(in /etc/sysconfig/jenkins on RHEL6).


zoom

Ugo Bellavance

unread,
May 12, 2016, 10:59:30 AM5/12/16
to Jenkins Users, John....@esentire.com

Raymond Accary

unread,
May 12, 2016, 2:01:41 PM5/12/16
to Jenkins Users, John....@esentire.com
Hi,
If it helps, you might avoid the crash by installing the monitoring plugin, and triggering garbage collection once the memory is approaching the maximum allocated heap size. This is a workaround until someone is able to diagnose the root cause. I have an open issue : https://issues.jenkins-ci.org/browse/JENKINS-34573 but thought I'd run the suggestion.

Ugo Bellavance

unread,
May 13, 2016, 8:50:56 AM5/13/16
to Jenkins Users, John....@esentire.com


On Thursday, May 12, 2016 at 2:01:41 PM UTC-4, Raymond Accary wrote:
Hi,
If it helps, you might avoid the crash by installing the monitoring plugin, and triggering garbage collection once the memory is approaching the maximum allocated heap size. This is a workaround until someone is able to diagnose the root cause. I have an open issue : https://issues.jenkins-ci.org/browse/JENKINS-34573 but thought I'd run the suggestion.


Have anyone monitored the JVM with VisualVM? I have found that it looks like a specific memory pool may be filling up: Old Gen.  I'm trying now with:

-Djava.awt.headless=true -Dcom.sun.management.jmxremote -Xms10752m -Xmx10752m -XX:NewRatio=10

We'll see in a few days.  In the last few days, the problem have occured each morning at around 4:45.  I'm not sure if the same jobs run on the week-ends as well.

Ugo Bellavance

unread,
May 17, 2016, 8:12:16 AM5/17/16
to Jenkins Users
From what I can see, it will always fill the memory.  Probably a leak, but I don't know how to be 100% sure about it. When I try to load a heap dump in VisualVM, but it shows nothing except "Not supported for this JVM. 

Stephen Connolly

unread,
May 17, 2016, 8:56:34 AM5/17/16
to jenkins...@googlegroups.com
I will repeat:

I am suspecting JENKINS-34213 may be your issue.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.

Ugo Bellavance

unread,
May 17, 2016, 9:14:08 AM5/17/16
to Jenkins Users
Thanks, but I can see that this issue is fixed but I don't know how to update my Jenkins install so that I have the fix.  I'm using 1.656.

Ugo

Stephen Connolly

unread,
May 17, 2016, 9:28:42 AM5/17/16
to jenkins...@googlegroups.com
It is fixed, but in the latest version of remoting... which is not something you can upgrade without either building a custom build of Jenkins or upgrading Jenkins.

You could crack open your jenkins.war and replace the remoting.jar with the fixed version and seal it back up again and see if that fixes your issue... probably the quickest way to confirm if my theory as to the source of your leak is correct, but you would need to know what you are doing

Stephen Connolly

unread,
May 17, 2016, 9:29:16 AM5/17/16
to jenkins...@googlegroups.com
Jenkins 2.5 has the fix IIUC

Ugo Bellavance

unread,
May 17, 2016, 2:01:41 PM5/17/16
to Jenkins Users
Ok, I did analyse the heap dump with Eclipse's Memory Analyzer Tool and here's what I found:

One instance of "java.util.concurrent.ScheduledThreadPoolExecutor" loaded by "<system class loader>" occupies 6,801,517,856 (89.16%) bytes. The instance is referenced by org.tmatesoft.svn.core.wc.DefaultSVNRepositoryPool @ 0x561b36858 , loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x56057c428". The memory is accumulated in one instance of "java.util.concurrent.RunnableScheduledFuture[]" loaded by "<system class loader>".

Does it match JENKINS-34213?

Stephen Connolly

unread,
May 18, 2016, 8:56:37 AM5/18/16
to jenkins...@googlegroups.com
Hard to say for certain as that could still be references maintained by the unexported objects that are pending unexport. If you can get your instance into that state and then have it sit idle for a couple of hours... if it is JENKINS-34573 then you should see the retained memory reduce as one reference is cleared every minute. Alternatively disconnect and reconnect all your slaves and see if that frees the memory up - JENKINS-34573 is only an issue for long running busy slaves

Ugo Bellavance

unread,
May 18, 2016, 9:30:02 AM5/18/16
to jenkins...@googlegroups.com
Hi Stephen,

We don't use slaves. We only have one master.

Should I create an issue in JIRA for my specific issue?

(is there a reason why you got the conservation out of the group?)

Thanks,

--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/qLwBFyQ84Z4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/CA%2BnPnMyz55b4ppFhQ3STgH5fK7cwmanns8VR_yFmmDLZ5BVKeA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Ugo Bellavance (ug...@lubik.ca)

Stephen Connolly

unread,
May 18, 2016, 10:40:21 AM5/18/16
to jenkins...@googlegroups.com
If you have no slaves and only run builds on the master then we can definitively state that JENKINS-34573 is not the root cause. If you have slaves (even if idle or unused) then we cannot.

If you have a definitive answer to the above question then you should probably create a JIRA

Ugo Bellavance

unread,
May 18, 2016, 10:45:44 AM5/18/16
to Jenkins Users
We only have 1 Jenkins server so I'll create a JIRA,

Thanks a lot for your help :)!

Ugo Bellavance

unread,
May 18, 2016, 1:26:24 PM5/18/16
to Jenkins Users

Ugo Bellavance

unread,
May 24, 2016, 8:23:51 AM5/24/16
to Jenkins Users
Any idea about how long it should take before I get at least an acknoledgement on my JIRA?

Stephen Connolly

unread,
May 24, 2016, 9:53:51 AM5/24/16
to jenkins...@googlegroups.com
This is a community project. If you need commercial support with corresponding SLAs then you might consider obtaining a support contract from a commercial supplier. I know my employers (CloudBees) are in this realm and I am aware that there are others.

Otherwise, unless you can find somebody who is interested in scratching this itch and has time to scratch it, you are just going to have to wait or try and scratch it yourself.

Reply all
Reply to author
Forward
0 new messages