[JIRA] (JENKINS-50474) ArrayIndexOutOfBoundsException cause job "hang"

4 views
Skip to first unread message

piotrzsl@gmail.com (JIRA)

unread,
Mar 29, 2018, 4:03:02 AM3/29/18
to jenkinsc...@googlegroups.com
Piotr Zegar created an issue
 
Jenkins / Bug JENKINS-50474
ArrayIndexOutOfBoundsException cause job "hang"
Issue Type: Bug Bug
Assignee: Unassigned
Components: workflow-cps-plugin
Created: 2018-03-29 08:02
Environment: Jenkins ver. 2.113
Red-hat 6
Priority: Minor Minor
Reporter: Piotr Zegar

Since some time I'm getting jobs hand when executing node("master"), today I found this stack-trace in node description.

Any ideas ?

<wf.a.ErrorAction plugin="workfl...@2.26">
<error class="java.lang.ArrayIndexOutOfBoundsException">
<detailMessage>0</detailMessage>
<stackTrace>
<trace>org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:588)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:559)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:184)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:331)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:82)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:243)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:231)</trace>
<trace>org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)</trace>
<trace>java.util.concurrent.FutureTask.run(FutureTask.java:266)</trace>
<trace>hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)</trace>
<trace>jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)</trace>
<trace>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)</trace>
<trace>java.util.concurrent.FutureTask.run(FutureTask.java:266)</trace>
<trace>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)</trace>
<trace>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)</trace>
<trace>java.lang.Thread.run(Thread.java:748)</trace>
</stackTrace>
<suppressedExceptions class="java.util.Collections$UnmodifiableRandomAccessList" resolves-to="java.util.Collections$UnmodifiableList">
<c class="list"/>
<list reference="../c"/>
</suppressedExceptions>
</error>
</wf.a.ErrorAction>
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

andrew.bayer@gmail.com (JIRA)

unread,
Mar 29, 2018, 12:31:02 PM3/29/18
to jenkinsc...@googlegroups.com
Andrew Bayer commented on Bug JENKINS-50474
 
Re: ArrayIndexOutOfBoundsException cause job "hang"

Can you provide the Jenkinsfile that you're running, or a minimal reproduction case?

Sam Van Oort - this looks to be FlowHead wackiness. Before I try to figure it out, anything jump out at you?

svanoort@cloudbees.com (JIRA)

unread,
Mar 29, 2018, 4:10:02 PM3/29/18
to jenkinsc...@googlegroups.com

Andrew Bayer I don't see anything looking like it's FlowHead wackiness here, but lets get some more info:

Piotr Zegar Do you have Jenkins logs for this one? Anything like "failed to resume" with the job name? And does the build log show what durability mode was used here?

andrew.bayer@gmail.com (JIRA)

unread,
Mar 29, 2018, 5:15:02 PM3/29/18
to jenkinsc...@googlegroups.com

Sorry Sam Van Oort - I meant the exception is here in DSL, which is FlowHead-related. Sorry for not being more clear/detailed - that's what I get for commenting from my phone. =)

piotrzsl@gmail.com (JIRA)

unread,
Mar 29, 2018, 5:26:02 PM3/29/18
to jenkinsc...@googlegroups.com

piotrzsl@gmail.com (JIRA)

unread,
Apr 2, 2018, 3:48:02 PM4/2/18
to jenkinsc...@googlegroups.com

Few things,

I started to see these exceptions after I added usage of this function before every state:

 

def isNodeActive()
{
try
{
def nodes = jenkins.model.Jenkins.getInstance().getNodes();
def nodeName = "${NODE_NAME}";
for(Slave node in nodes)
{
if (node.getNodeName() != nodeName)
{
continue;
}

if (node.getComputer().isAcceptingTasks())
{
return true;
}
}
}
catch (Exception e) {
return false;
}
return false;
}

I running most of tests in single jobs mainly during night on 40+ servers.

 

And wanted to stop starting new tests when servers run offline due to schedule.

I also have separate jobs that run this code, to kill these hanging jobs (that hang on node allocation).

import hudson.model.FreeStyleBuild
import hudson.model.Job
import hudson.model.Result
import hudson.model.Run
import java.util.Calendar
import jenkins.model.Jenkins
import org.jenkinsci.plugins.workflow.job.WorkflowRun
import org.jenkinsci.plugins.workflow.support.steps.StageStepExecution

//24 hours in a day, 3600 seconds in 1 hour, 1000 milliseconds in 1 second
long time_in_millis = 24*3600*1000
Calendar rightNow = Calendar.getInstance()

for(Job job in Jenkins.instance.getAllItems(Job.class))
{
if (!job.isBuilding())
{
continue;
}

for(Run item in job.builds)
{
if (!item.isBuilding())
{
continue;
}

if ((rightNow.getTimeInMillis() - item.getStartTimeInMillis()) < time_in_millis)
{
continue;
}

if(item in WorkflowRun) {
WorkflowRun run = (WorkflowRun) item
//hard kill
run.doKill()
//release pipeline concurrency locks
StageStepExecution.exit(run)
println "Killed ${run}"
} else if(item in FreeStyleBuild) {
FreeStyleBuild run = (FreeStyleBuild) item
run.executor.interrupt(Result.ABORTED)
println "Killed ${run}"
} else {
println "WARNING: Don't know how to handle ${item.class}"
}
}
}

But above function shouldn't be executed in this case:

 

[Pipeline] node
Running on Jenkins in /var/lib/jenkins/workspace/checkCommit@46
[Pipeline] End of Pipeline
java.lang.ArrayIndexOutOfBoundsException
Finished: FAILURE

Above "node" is allocated in try - finally block, but looks like finally block where not executed.

And then job hangs (still taking place in node - could be above script issue).

dnusbaum@cloudbees.com (JIRA)

unread,
May 9, 2019, 5:24:02 PM5/9/19
to jenkinsc...@googlegroups.com

FWIW I ran into this trying to reproduce a totally different issue. The following (contrived) test in workflow-cps reproduces it somewhat consistently for me. Changing my nonBlocking step for a simpler block-scope step like timeout made the problem go away

@Issue("JENKINS-56890") // The otherwise unrelated issue I was investigating.
@Test public void generalNonBlockingStepExecutionCpsThreadSafety() {
    rr.then(r -> {
        WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
        p.setDefinition(new CpsFlowDefinition(
                "Map branches = [failFast: false]\n" +
                "for (int i = 0; i < 250; i++) {\n" + // Increasing this value seems to make JENKINS-50474 more likely. 
                "  branches[\"branch${i}\"] = {\n" +
                "    nonBlocking {\n" +
                "      sleep(time: Math.random() * 100, unit: 'MILLISECONDS')\n" +
                "    }\n" +
                "  }\n" +
                "}\n" +
                "parallel(branches)\n", false));
        WorkflowRun b = r.buildAndAssertSuccess(p);
    });
}

/**
 * Step that uses {@link StepContext} in GeneralNonBlockingStepExecution.TailCall while not on the CPS VM thread
 * to demonstrate a concurrency issue in {@link CpsStepContext}.
 */
public static class NonBlockingStep extends Step {
    @DataBoundConstructor
    public NonBlockingStep() { }
    @Override
    public StepExecution start(StepContext context) throws Exception {
        return new Execution(context);
    }
    private static class Execution extends GeneralNonBlockingStepExecution {
        public Execution(StepContext context) {
            super(context);
        }
        @Override
        public boolean start() throws Exception {
            run(() -> {
                getContext().newBodyInvoker().withCallback(new Callback()).start();
            });
            return false;
        }
        private final class Callback extends TailCall {
            @Override
            protected void finished(StepContext context) throws Exception {
                Run<?, ?> run = context.get(Run.class);
                TaskListener listener = context.get(TaskListener.class);
                listener.getLogger().println("Finished executing nonBlockingStep in " + run);
            }
        }
    }
    @TestExtension("generalNonBlockingStepExecutionThreadSafety")
    public static class DescriptorImpl extends StepDescriptor {
        @Override
        public String getFunctionName() {
            return "nonBlocking";
        }
        @Override
        public boolean takesImplicitBlockArgument() {
            return true;
        }
        @Override
        public Set<? extends Class<?>> getRequiredContext() {
            return ImmutableSet.of(Run.class, TaskListener.class);
        }
    }
}

In addition to the exception in the issue description I have also seen this one:

java.util.ConcurrentModificationException
    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
    at java.util.ArrayList$Itr.next(ArrayList.java:859)
    at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:623)
    at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:593)
    at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:184)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:348)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:260)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:248)
    at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I've also seen the test just hang indefinitely and time out, and as far as I can tell from the thread dump nothing is running so I'm not sure what is happening in that case.

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

dnusbaum@cloudbees.com (JIRA)

unread,
May 9, 2019, 5:25:02 PM5/9/19
to jenkinsc...@googlegroups.com
Devin Nusbaum edited a comment on Bug JENKINS-50474
FWIW I ran into this trying to reproduce a totally different issue. The following (contrived) test in workflow-cps reproduces it somewhat consistently for me. Changing my {{nonBlocking}} step for a simpler block-scope step like {{timeout}} made the problem go away , so perhaps this only happens if the step execution is some kind of non-blocking execution that runs on a separate thread.

{code}
{code}


In addition to the exception in the issue description I have also seen this one:

{noformat}

java.util.ConcurrentModificationException
    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
    at java.util.ArrayList$Itr.next(ArrayList.java:859)
    at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:623)
    at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:593)
    at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:184)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:348)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:260)
    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:248)
    at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
{noformat}


I've also seen the test just hang indefinitely and time out, and as far as I can tell from the thread dump nothing is running so I'm not sure what is happening in that case.

dnusbaum@cloudbees.com (JIRA)

unread,
May 9, 2019, 5:25:02 PM5/9/19
to jenkinsc...@googlegroups.com

e.moshaya@gmail.com (JIRA)

unread,
Sep 4, 2019, 6:11:02 AM9/4/19
to jenkinsc...@googlegroups.com

I'm also having the same issue with hanging jobs. See error below:

 

 [Pipeline] End of Pipeline
 java.util.ConcurrentModificationException
 	at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
 	at java.util.ArrayList$Itr.next(ArrayList.java:859)
 	at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:645)
 	at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:615)
 	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:196)
 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
 	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66)
 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 	at java.lang.Thread.run(Thread.java:748)
 Finished: FAILURE
Reply all
Reply to author
Forward
0 new messages