[JIRA] (JENKINS-41945) Eventual NPE while running many concurrent declarative pipelines

4 views
Skip to first unread message

kshultz@cloudbees.com (JIRA)

unread,
Feb 10, 2017, 5:12:04 PM2/10/17
to jenkinsc...@googlegroups.com
Karl Shultz created an issue
 
Jenkins / Bug JENKINS-41945
Eventual NPE while running many concurrent declarative pipelines
Issue Type: Bug Bug
Assignee: Andrew Bayer
Attachments: jenkins.log
Components: pipeline-model-definition-plugin
Created: 2017/Feb/10 10:11 PM
Environment: openjdk version "1.8.0_45-internal"
Ubuntu 15.04
Priority: Major Major
Reporter: Karl Shultz

After an hour or so of running 101 concurrent "hello world" pipeline scripts, the Jenkins server becomes unresponsive. The first signs of trouble in the log around here - I'll attach the entire log as well. The attached one probably has a bunch of noise in it from some earlier work.

The pipeline script itself looks exactly like this:

pipeline {
    
    agent any
    
    stages {
            stage("TestingTesting") {
            steps {
                sh 'echo Hello world'
            }
        }
    }
}
INFO: DeclarativePipelines/inlineThing34 #32 completed: SUCCESS
Feb 10, 2017 8:27:44 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution cleanUpHeap
WARNING: failed to clean up memory from Owner[DeclarativePipelines/inlineThing37/32:DeclarativePipelines/inlineThing37 #32]
java.lang.NullPointerException
	at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57)
	at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
	at java.lang.reflect.Field.get(Field.java:393)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpGlobalClassSet(CpsFlowExecution.java:1034)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:966)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpHeap(CpsFlowExecution.java:941)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:363)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

andrew.bayer@gmail.com (JIRA)

unread,
Feb 10, 2017, 5:14:01 PM2/10/17
to jenkinsc...@googlegroups.com

andrew.bayer@gmail.com (JIRA)

unread,
Feb 10, 2017, 5:14:01 PM2/10/17
to jenkinsc...@googlegroups.com

andrew.bayer@gmail.com (JIRA)

unread,
Feb 10, 2017, 5:16:03 PM2/10/17
to jenkinsc...@googlegroups.com
Andrew Bayer commented on Bug JENKINS-41945
 
Re: Eventual NPE while running many concurrent declarative pipelines

FYI - this line is the problem line in workflow-cps 2.26, which is what's installed. The NPE seems to be that classInfo is null.

andrew.bayer@gmail.com (JIRA)

unread,
Feb 10, 2017, 5:17:01 PM2/10/17
to jenkinsc...@googlegroups.com
Andrew Bayer updated an issue
Change By: Andrew Bayer
Environment:
openjdk version "1.8.0_45-internal"
Ubuntu 15.04

workflow-cps 2.26
pipeline-model-definition 1.0.1
core 2.32.2

kshultz@cloudbees.com (JIRA)

unread,
Feb 10, 2017, 10:58:01 PM2/10/17
to jenkinsc...@googlegroups.com
Karl Shultz commented on Bug JENKINS-41945
 
Re: Eventual NPE while running many concurrent declarative pipelines

I ran the following standard pipeline script in the same way (100 concurrently, every minute):

node { 
 stage("TestingTesting") {
   sh 'echo Hello world'
  }
}

These standard pipelines feel like they run considerably faster than the declarative ones from before. So it could be that I'll need to run way more than 100 of them to reproduce the issue with standard pipeline.

This time, I've got something running to tell me about system utilization:

vmstat 15 | awk '{system("X=$(date +\"%x %T\"); echo $X\\\c");print $0}' > vmstat.15.standard &

(Thanks Google!) I'll do likewise the next time I go to recreate this with declarative pipelines.

kshultz@cloudbees.com (JIRA)

unread,
Feb 10, 2017, 11:17:02 PM2/10/17
to jenkinsc...@googlegroups.com

kshultz@cloudbees.com (JIRA)

unread,
Feb 10, 2017, 11:17:03 PM2/10/17
to jenkinsc...@googlegroups.com
 
Re: Eventual NPE while running many concurrent declarative pipelines

jenkins--failure.log I let the standard pipelines run 20 minutes or so. There wasn't any sign of degraded performance to speak of. Here's the beginning and end of the vmstat output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
02/11/2017 01:47:41c
 2  0      0 12885760 189508 831916    0    0    15   268  273  144 75  0 24  1  0

(snip)

02/11/2017 02:06:41c
 0  0      0 12805028 191464 832472    0    0     0   432  175  204  0  0 99  1  0

That amount of decrease in free memory is pretty small. The UI remains responsive throughout. To contrast, here's a similar timespan, with the declarative jobs running. Bu the time I stopped vmstat recording, the UI had stopped responding.

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
02/11/2017 02:25:33c
 1  0      0 13255960 195416 915500    0    0    13   251  251  144 65  0 33  1  0

(snip)

02/11/2017 02:45:33c
 4  0      0 11748496 199188 981928    0    0     0  1471 1290  735 93  0  7  0  0

The declarative jobs are consuming more resource, but I think that's expected.

The log, which I'll attach as jenkins--failure.log, is interesting. The jobs run for a while. Then I start to see these:

Feb 11, 2017 2:27:34 AM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: DeclarativePipelines/inlineThing72 #80 completed: SUCCESS
Feb 11, 2017 2:27:35 AM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: DeclarativePipelines/inlineThing70 #80 completed: SUCCESS
Feb 11, 2017 2:27:39 AM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution cleanUpHeap
WARNING: failed to clean up memory from Owner[DeclarativePipelines/inlineThing63/81:DeclarativePipelines/inlineThing63 #81]
java.lang.NullPointerException
	at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57)
	at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38)
	at java.lang.reflect.Field.get(Field.java:393)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpGlobalClassSet(CpsFlowExecution.java:1034)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:966)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:968)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:957)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpHeap(CpsFlowExecution.java:941)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:363)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)


Things start to look different shortly after this. There are more NPEs, then a bunch of successful builds. Around line 49101 in jenkins--failure.log, which I'll attach, I see what looks to me like a new error:

Feb 11, 2017 2:28:23 AM org.jenkinsci.plugins.workflow.cps.CpsThreadGroup saveProgramIfPossible
WARNING: program state save failed
java.io.IOException: Failed to persist /var/lib/jenkins/jobs/DeclarativePipelines/jobs/inlineThing96/builds/85/program.dat
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:461)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:427)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:415)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:360)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to serialize org.jenkinsci.plugins.pipeline.modeldefinition.agent.DeclarativeAgent#context for class org.jenkinsci.plugins.pipeline.modeldefinition.agent.impl.Any
	at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:256)
	at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:224)

Rather than speculate any further on my own I've attached the log. I put a "comment" in at line 49102, just to help me find things.

kshultz@cloudbees.com (JIRA)

unread,
Feb 10, 2017, 11:40:01 PM2/10/17
to jenkinsc...@googlegroups.com

I'll let the standard ones run overnight, and see how things look tomorrow.

kshultz@cloudbees.com (JIRA)

unread,
Feb 11, 2017, 8:17:02 AM2/11/17
to jenkinsc...@googlegroups.com

kshultz@cloudbees.com (JIRA)

unread,
Feb 11, 2017, 8:21:01 AM2/11/17
to jenkinsc...@googlegroups.com
 
Re: Eventual NPE while running many concurrent declarative pipelines

The standard pipelines ran overnight, and are still running this morning.

kshultz@cloudbees.com (JIRA)

unread,
Feb 12, 2017, 9:30:01 PM2/12/17
to jenkinsc...@googlegroups.com
Karl Shultz updated an issue
Change By: Karl Shultz
Attachment: logs.tar.gz
Attachment: hs_err_pid27051.log

kshultz@cloudbees.com (JIRA)

unread,
Feb 12, 2017, 9:31:01 PM2/12/17
to jenkinsc...@googlegroups.com
 
Re: Eventual NPE while running many concurrent declarative pipelines

Ran a test with 60 declarative pipelines running every minute, instead of 100. Still hit an out of memory condition. I'll attach a tarball of the logs. Inside that, jenkins.log.1 has a lot more historical data in it. jenkins.log is sort of its last gasp. Jenkins also generated an error report file. I'll attach that as well. logs.tar.gz hs_err_pid27051.log

jglick@cloudbees.com (JIRA)

unread,
Feb 13, 2017, 10:49:02 AM2/13/17
to jenkinsc...@googlegroups.com

These standard pipelines feel like they run considerably faster than the declarative ones from before.

Not surprising; I think to compare apples to apples you would need to make the standard Pipeline scripts much more complicated.

jglick@cloudbees.com (JIRA)

unread,
Feb 13, 2017, 12:45:01 PM2/13/17
to jenkinsc...@googlegroups.com
Jesse Glick started work on Bug JENKINS-41945
 
Change By: Jesse Glick
Status: Open In Progress

jglick@cloudbees.com (JIRA)

unread,
Feb 13, 2017, 1:04:02 PM2/13/17
to jenkinsc...@googlegroups.com
Jesse Glick updated an issue
 
Change By: Jesse Glick
Component/s: pipeline-model-definition-plugin

jglick@cloudbees.com (JIRA)

unread,
Feb 13, 2017, 1:04:02 PM2/13/17
to jenkinsc...@googlegroups.com

scm_issue_link@java.net (JIRA)

unread,
Feb 13, 2017, 5:10:03 PM2/13/17
to jenkinsc...@googlegroups.com
SCM/JIRA link daemon commented on Bug JENKINS-41945
 
Re: Eventual NPE while running many concurrent declarative pipelines

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
http://jenkins-ci.org/commit/workflow-cps-plugin/1155b889193713f7fedaef8de8688128136d4fe4
Log:
[FIXED JENKINS-41945] Ignore null elements in ClassInfo.globalClassSet.items.

jglick@cloudbees.com (JIRA)

unread,
Feb 13, 2017, 5:12:02 PM2/13/17
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages