We updated to 1.6.24 last Saturday, since then, we're running into problems where the Web interface gets suddenly slow, then eventually stops responding, though jobs are still processing.
I installed the javamelody plugin after the 2nd time, it show a sudden spike in memory from ~ 2GB to near the Heap max on the master.
In the memory histogram view it shows de.esailors.jenkins.teststability.StabilityTestData$Result taking 5GB (74%) of the heap, with 216M "instances". java.lang.object[] is taking 1.2GB, 325K instances.
It also shows outstanding requests for
$stapler/bound/a3eefa4e-d1b7-4112-a60d-f58bd64f4bb1/rerunBuild ajax POST /
Handling POST /$stapler/bound/f3fef2df-4d0f-4207-af7d-d296c4659d28/rerunBuild from [x.x.x.x]: RequestHandlerThread[#10]that have been running for a very long time. It correlates with someone clicking on the re-run button in a build pipeline view.
There's nothing obvious in the logs around the time of each incident, but each does have something like
WARNING: Failed to load [
path to jenkins home/somejob]/builds/576/junitResult.xml
java.io.FileNotFoundException: [
path to jenkins home/somejob]builds/576/junitResult.xml (No such file or directory)
If I look in the filesystem, that file is there.
There are also some of these, that I thought might be due to problems in getPreviousResult() brought on by having kill the server and restart it while jobs are in flight, while the UI is hung.
Aug 20, 2015 1:48:21 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING: Error while serving http://[
server name]/view/[
view name]/job/[
job name]/jacoco/graph
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ big stack trace that ends with ]
Caused by: java.lang.NullPointerException
at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:199)
at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:107)
at hudson.plugins.jacoco.JacocoBuildAction.getPreviousResult(JacocoBuildAction.java:249)
at hudson.plugins.jacoco.JacocoBuildAction.getPreviousResult(JacocoBuildAction.java:240)
at hudson.plugins.jacoco.JacocoBuildAction.getPreviousResult(JacocoBuildAction.java:36)
at hudson.plugins.jacoco.model.CoverageObject$1.createDataSet(CoverageObject.java:379)
at hudson.plugins.jacoco.model.CoverageObject$GraphImpl.createGraph(CoverageObject.java:418)
at hudson.util.Graph.render(Graph.java:87)
at hudson.util.Graph.doPng(Graph.java:98)
at hudson.plugins.jacoco.model.CoverageObject.doGraph(CoverageObject.java:373)
at hudson.plugins.jacoco.JacocoProjectAction.doGraph(JacocoProjectAction.java:53)
and similarly
Aug 21, 2015 10:30:03 AM hudson.ExpressionFactory2$JexlExpression evaluate
...
Caused by: java.lang.IllegalStateException: hudson.tasks.junit.TestResultAction@1b176aa2 was attached to both [jobname] #574 and [jobname] #576
at hudson.tasks.test.AbstractTestResultAction.getPreviousResult(AbstractTestResultAction.java:229)
WARNING: Caught exception evaluating: it.failureDiffString in /view/[
view name]/job/[
job name]. Reason: java.lang.reflect.InvocationTargetException
We upgraded to 1.625 and increased the heap, still hitting this.
We have 224 jobs, 50 workers
Ideas welcome.
I can try disabling the job stability plugin, but it's very useful.