[JIRA] (JENKINS-41945) Eventual NPE while running many concurrent declarative pipelines

kshultz@cloudbees.com (JIRA)

unread,

Feb 10, 2017, 5:12:04 PM2/10/17

to jenkinsc...@googlegroups.com

Karl Shultz created an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Issue Type:	Bug
Assignee:	Andrew Bayer
Attachments:	jenkins.log
Components:	pipeline-model-definition-plugin
Created:	2017/Feb/10 10:11 PM
Environment:	openjdk version "1.8.0_45-internal" Ubuntu 15.04
Priority:	Major
Reporter:	Karl Shultz

After an hour or so of running 101 concurrent "hello world" pipeline scripts, the Jenkins server becomes unresponsive. The first signs of trouble in the log around here - I'll attach the entire log as well. The attached one probably has a bunch of noise in it from some earlier work.

The pipeline script itself looks exactly like this:

 
                                                                pipeline {
    
    agent any
    
    stages {
            stage("TestingTesting") {
            steps {
                sh 'echo Hello world'
            }
        }
    }
}
 
                                                            

 
                                                                INFO: DeclarativePipelines/inlineThing34 #32 completed: SUCCESS
Feb 10, 2017 8:27:44 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution cleanUpHeap
WARNING: failed to clean up memory from Owner[DeclarativePipelines/inlineThing37/32:DeclarativePipelines/inlineThing37 #32]
java.lang.NullPointerException
	at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57)
	at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
	at java.lang.reflect.Field.get(Field.java:393)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpGlobalClassSet(CpsFlowExecution.java:1034)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:966)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpHeap(CpsFlowExecution.java:941)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:363)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

 
                                                            

Add Comment

This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)

andrew.bayer@gmail.com (JIRA)

unread,

Feb 10, 2017, 5:14:01 PM2/10/17

to jenkinsc...@googlegroups.com

Andrew Bayer assigned an issue to Jesse Glick

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Andrew Bayer
Assignee:	Andrew Bayer Jesse Glick

Add Comment

andrew.bayer@gmail.com (JIRA)

unread,

Feb 10, 2017, 5:14:01 PM2/10/17

to jenkinsc...@googlegroups.com

Andrew Bayer updated an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Andrew Bayer
Component/s:	workflow-cps-plugin

Add Comment

andrew.bayer@gmail.com (JIRA)

unread,

Feb 10, 2017, 5:16:03 PM2/10/17

to jenkinsc...@googlegroups.com

Andrew Bayer commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

FYI - this line is the problem line in workflow-cps 2.26, which is what's installed. The NPE seems to be that classInfo is null.

Add Comment

andrew.bayer@gmail.com (JIRA)

unread,

Feb 10, 2017, 5:17:01 PM2/10/17

to jenkinsc...@googlegroups.com

Andrew Bayer updated an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Andrew Bayer
Environment:

openjdk version "1.8.0_45-internal"
Ubuntu 15.04

workflow-cps 2.26
pipeline-model-definition 1.0.1
core 2.32.2

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 10, 2017, 10:58:01 PM2/10/17

to jenkinsc...@googlegroups.com

Karl Shultz commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

I ran the following standard pipeline script in the same way (100 concurrently, every minute):

 
                                                                node { 
 stage("TestingTesting") {
   sh 'echo Hello world'
  }
}
 
                                                            

These standard pipelines feel like they run considerably faster than the declarative ones from before. So it could be that I'll need to run way more than 100 of them to reproduce the issue with standard pipeline.

This time, I've got something running to tell me about system utilization:

 
                                                                vmstat 15 | awk '{system("X=$(date +\"%x %T\"); echo $X\\\c");print $0}' > vmstat.15.standard & 
                                                            

(Thanks Google!) I'll do likewise the next time I go to recreate this with declarative pipelines.

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 10, 2017, 11:17:02 PM2/10/17

to jenkinsc...@googlegroups.com

Karl Shultz updated an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Karl Shultz
Attachment:	jenkins--failure.log

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 10, 2017, 11:17:03 PM2/10/17

to jenkinsc...@googlegroups.com

Karl Shultz commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

jenkins--failure.log I let the standard pipelines run 20 minutes or so. There wasn't any sign of degraded performance to speak of. Here's the beginning and end of the vmstat output:

 
                                                                procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
02/11/2017 01:47:41c
 2  0      0 12885760 189508 831916    0    0    15   268  273  144 75  0 24  1  0

(snip)

02/11/2017 02:06:41c
 0  0      0 12805028 191464 832472    0    0     0   432  175  204  0  0 99  1  0

That amount of decrease in free memory is pretty small. The UI remains responsive throughout. To contrast, here's a similar timespan, with the declarative jobs running. Bu the time I stopped vmstat recording, the UI had stopped responding.

 
                                                                procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
02/11/2017 02:25:33c
 1  0      0 13255960 195416 915500    0    0    13   251  251  144 65  0 33  1  0

(snip)

02/11/2017 02:45:33c
 4  0      0 11748496 199188 981928    0    0     0  1471 1290  735 93  0  7  0  0

The declarative jobs are consuming more resource, but I think that's expected.

The log, which I'll attach as jenkins--failure.log, is interesting. The jobs run for a while. Then I start to see these:

 
                                                                Feb 11, 2017 2:27:34 AM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: DeclarativePipelines/inlineThing72 #80 completed: SUCCESS
Feb 11, 2017 2:27:35 AM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: DeclarativePipelines/inlineThing70 #80 completed: SUCCESS
Feb 11, 2017 2:27:39 AM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution cleanUpHeap
WARNING: failed to clean up memory from Owner[DeclarativePipelines/inlineThing63/81:DeclarativePipelines/inlineThing63 #81]
java.lang.NullPointerException
	at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57)
	at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
	at java.lang.reflect.Field.get(Field.java:393)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpGlobalClassSet(CpsFlowExecution.java:1034)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:966)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpHeap(CpsFlowExecution.java:941)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:363)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)


 
                                                            

Things start to look different shortly after this. There are more NPEs, then a bunch of successful builds. Around line 49101 in jenkins--failure.log, which I'll attach, I see what looks to me like a new error:

 
                                                                Feb 11, 2017 2:28:23 AM org.jenkinsci.plugins.workflow.cps.CpsThreadGroup saveProgramIfPossible
WARNING: program state save failed
java.io.IOException: Failed to persist /var/lib/jenkins/jobs/DeclarativePipelines/jobs/inlineThing96/builds/85/program.dat
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:461)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:427)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:415)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:360)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to serialize org.jenkinsci.plugins.pipeline.modeldefinition.agent.DeclarativeAgent#context for class org.jenkinsci.plugins.pipeline.modeldefinition.agent.impl.Any
	at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:256)
	at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:224)
 
                                                            

Rather than speculate any further on my own I've attached the log. I put a "comment" in at line 49102, just to help me find things.

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 10, 2017, 11:40:01 PM2/10/17

to jenkinsc...@googlegroups.com

Karl Shultz commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

I'll let the standard ones run overnight, and see how things look tomorrow.

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 11, 2017, 8:17:02 AM2/11/17

to jenkinsc...@googlegroups.com

Karl Shultz updated an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Karl Shultz
Attachment:	screenshot-1.png

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 11, 2017, 8:21:01 AM2/11/17

to jenkinsc...@googlegroups.com

Karl Shultz commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

The standard pipelines ran overnight, and are still running this morning.

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 12, 2017, 9:30:01 PM2/12/17

to jenkinsc...@googlegroups.com

Karl Shultz updated an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Karl Shultz
Attachment:	logs.tar.gz
Attachment:	hs_err_pid27051.log

Add Comment

kshultz@cloudbees.com (JIRA)

unread,

Feb 12, 2017, 9:31:01 PM2/12/17

to jenkinsc...@googlegroups.com

Karl Shultz commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

Ran a test with 60 declarative pipelines running every minute, instead of 100. Still hit an out of memory condition. I'll attach a tarball of the logs. Inside that, jenkins.log.1 has a lot more historical data in it. jenkins.log is sort of its last gasp. Jenkins also generated an error report file. I'll attach that as well. logs.tar.gz hs_err_pid27051.log

Add Comment

jglick@cloudbees.com (JIRA)

unread,

Feb 13, 2017, 10:49:02 AM2/13/17

to jenkinsc...@googlegroups.com

Jesse Glick commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

These standard pipelines feel like they run considerably faster than the declarative ones from before.

Not surprising; I think to compare apples to apples you would need to make the standard Pipeline scripts much more complicated.

Add Comment

jglick@cloudbees.com (JIRA)

unread,

Feb 13, 2017, 12:45:01 PM2/13/17

to jenkinsc...@googlegroups.com

Jesse Glick started work on

JENKINS-41945

Change By:	Jesse Glick
Status:	Open In Progress

Add Comment

jglick@cloudbees.com (JIRA)

unread,

Feb 13, 2017, 1:04:02 PM2/13/17

to jenkinsc...@googlegroups.com

Jesse Glick updated an issue

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Jesse Glick
Component/s:	pipeline-model-definition-plugin

Add Comment

jglick@cloudbees.com (JIRA)

unread,

Feb 13, 2017, 1:04:02 PM2/13/17

to jenkinsc...@googlegroups.com

Jesse Glick updated

JENKINS-41945

Jenkins /

JENKINS-41945

Eventual NPE while running many concurrent declarative pipelines

Change By:	Jesse Glick
Status:	In Progress Review

Add Comment

scm_issue_link@java.net (JIRA)

unread,

Feb 13, 2017, 5:10:03 PM2/13/17

to jenkinsc...@googlegroups.com

SCM/JIRA link daemon commented on

JENKINS-41945

Re: Eventual NPE while running many concurrent declarative pipelines

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
http://jenkins-ci.org/commit/workflow-cps-plugin/1155b889193713f7fedaef8de8688128136d4fe4
Log:
[FIXED JENKINS-41945] Ignore null elements in ClassInfo.globalClassSet.items.