[JIRA] (JENKINS-59073) Random InterruptedException in pipeline builds

4 views
Skip to first unread message

chaddo@online.de (JIRA)

unread,
Aug 25, 2019, 7:35:02 AM8/25/19
to jenkinsc...@googlegroups.com
Chad Williams created an issue
 
Jenkins / Bug JENKINS-59073
Random InterruptedException in pipeline builds
Issue Type: Bug Bug
Assignee: Unassigned
Components: core, pipeline, swarm-plugin
Created: 2019-08-25 11:34
Environment: Jenkins version 2.190
Java: OpenJDK 1.8.0_222
Labels: pipeline exception slave Shellscript shell shell-command shared-groovy-libraries shared-libraries swarm
Priority: Critical Critical
Reporter: Chad Williams

Within pipeline builds, shell steps randomly fail with an unspecific java.lang.InterruptedException, a full stack trace is listed below.

Unfortunately, this happens often enough to be a major issue within our development process since negative build results cannot be trusted and builds of multi-hour length might have to be retriggered multiple times.

Since we cannot reliably trigger the issue, I cannot provide an minimal example for reproduction. This is especially painful since all debugging has to happen in production.

Background information:

  • Our slaves are started dynamically using the swarm plugin
  • The orchestration of these slaves is handled by a shared library, the respective step is available available on github
  • We've only seen the exception occur on shell steps, other steps do not seem to throw (although not many were tested)
  • Only the first shell step might throw, if it succeeds the others will be fine
  • master can catch the Exception and continue with error handling

Complete stack trace:

java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at hudson.remoting.Request.call(Request.java:177)
	at hudson.remoting.Channel.call(Channel.java:956)
	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1060)
	at hudson.Launcher$ProcStarter.start(Launcher.java:455)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:194)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:317)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:286)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:179)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
Caused: java.io.InterruptedIOException
	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1062)
	at hudson.Launcher$ProcStarter.start(Launcher.java:455)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:194)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:317)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:286)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:179)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
	at jesh.call(jesh.groovy:28)
	at withModules.call(withModules.groovy:45)
	at WorkflowScript.run(WorkflowScript:57)
	at onSlurmResource.call(onSlurmResource.groovy:46)
	at runOnSlave.call(runOnSlave.groovy:37)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:84)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
	at sun.reflect.GeneratedMethodAccessor395.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.LocalVariableBlock$LocalVariable.get(LocalVariableBlock.java:39)
	at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)
	at com.cloudbees.groovy.cps.impl.LocalVariableBlock.evalLValue(LocalVariableBlock.java:28)
	at com.cloudbees.groovy.cps.LValueBlock$BlockImpl.eval(LValueBlock.java:55)
	at com.cloudbees.groovy.cps.LValueBlock.eval(LValueBlock.java:16)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

chaddo@online.de (JIRA)

unread,
Aug 25, 2019, 9:41:03 AM8/25/19
to jenkinsc...@googlegroups.com
Chad Williams updated an issue
Change By: Chad Williams
Within pipeline builds, shell steps randomly fail with an unspecific _java.lang.InterruptedException_, a full stack trace is listed below.


Unfortunately, this happens often enough to be a major issue within our development process since negative build results cannot be trusted and builds of multi-hour length might have to be retriggered multiple times.

Since we cannot reliably trigger the issue, I cannot provide an minimal example for reproduction. This is especially painful since all debugging has to happen in production.

*Background information:*
* Our slaves are started dynamically using the swarm plugin
* The orchestration of these slaves is handled by a shared library, the respective step is
available [ [ available on github ] |https://github.com/electronicvisions/jenlib/blob/master/vars/onSlurmResource.groovy]
* We've only seen the exception occur on shell steps, other steps do not seem to throw (although not many were tested)
* Only the first shell step might throw, if it succeeds the others will be fine
* _master_ can catch the Exception and continue with error handling

*Complete stack trace:*
{noformat}
{noformat}

me@basilcrow.com (JIRA)

unread,
Oct 31, 2019, 3:08:03 PM10/31/19
to jenkinsc...@googlegroups.com
Basil Crow commented on Bug JENKINS-59073
 
Re: Random InterruptedException in pipeline builds

I happened to come across this bug while triaging the Swarm component. Unclear what the cause of your problem is so far. It might be related to Swarm, or it might be related to durable-task or Remoting. It would be helpful to know what versions of the Swarm plugin, Swarm client, durable-task, and workflow-durable-task you are running. From these we would be able to tell what version of Remoting you are using on each side of the connection. If these plugins aren't already up-to-date, try updating them first.

The proximate cause of your problem given the above stack trace is hudson.remoting.Request.call(Request.java:177):

                    while(response==null && !channel.isInClosed())
                        // I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel,
                        // but in production I've observed that in rare occasion it can block forever, even after a channel
                        // is gone. So be defensive against that.
                        wait(30*1000); <--- cause of interruption

Here the Jenkins master is timing out after waiting for 30 seconds for some type of response from the agent over Remoting. It then throws an InterruptedException which causes the job to fail. You should try to look into the other side of the connection (the agent side) to see why it stopped responding to the master. Try turning up the logging as high as possible on the Swarm client side and see if anything suspicious is present there.

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages