[JIRA] (JENKINS-56764) Pipeline fails on some AMI images / node types because of Broken Pipe

17 views
Skip to first unread message

alex@nederlof.com (JIRA)

unread,
Mar 26, 2019, 10:34:03 AM3/26/19
to jenkinsc...@googlegroups.com
Alex Nederlof updated an issue
 
Jenkins / Bug JENKINS-56764
Pipeline fails on some AMI images / node types because of Broken Pipe
Change By: Alex Nederlof
Priority: Minor Critical
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

alex@nederlof.com (JIRA)

unread,
Mar 26, 2019, 10:34:04 AM3/26/19
to jenkinsc...@googlegroups.com
Alex Nederlof created an issue
Issue Type: Bug Bug
Assignee: Ioannis Canellos
Components: kubernetes-pipeline-plugin
Created: 2019-03-26 14:33
Environment: Jenkins ver. 2.164.1, kubernetes plugin 0.2.3, Kubernetes 1.11
Priority: Minor Minor
Reporter: Alex Nederlof

When I run a simple container, using a yaml, like this:

podTemplate(label: label, yaml: """
apiVersion: v1
kind: Pod
spec:
  tolerations:
    - key: "gpu"
      operator: Exists
      effect: "NoSchedule"
  nodeSelector:
    fleet: gpu-spot
  containers: 
    - name: "testcontainer"
      image: "python"
      imagePullPolicy: Always
      command:
      - cat
      tty: true
      workingDir: /home/jenkins
""") {
    node(label) {
        def myRepo = checkout scm
        stage('Test') {
            container('testcontainer') {
                echo "Here we go"
                sh "echo 'hello world'"
            }
        }
    }
}

It returns the "Herer we go", but as soon as it hits the SH command, it fails. The curious thing is, this only happens on a node pool of GPU workers, with the Amazon GPU optimised AMI. It does not happen on "regular" CPU workers. SH works fine there.

It doesn't look like a connectivity issue either, since the JNLP container creates a connection successfully according to the logs.

 

 

Cleaning workspace
[Pipeline] stage
[Pipeline] { (Test)
[Pipeline] container
[Pipeline] {
[Pipeline] echo
Here we go
[Pipeline] sh
 > git rev-list --no-walk c3e139d715093daa4ec986a3e0d17f9aba3437a0 # timeout=10
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[Pipeline] }
[Pipeline] // container
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // podTemplate
[Pipeline] End of Pipeline

GitHub has been notified of this commit’s build result

java.io.IOException: Pipe closed
	at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
	at java.io.PipedInputStream.receive(PipedInputStream.java:226)
	at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
	at java.io.OutputStream.write(OutputStream.java:75)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.doExec(ContainerExecDecorator.java:499)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.access$1200(ContainerExecDecorator.java:73)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:390)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:246)
	at hudson.Launcher$ProcStarter.start(Launcher.java:455)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:206)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:305)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:268)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:176)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
	at sun.reflect.GeneratedMethodAccessor1014.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:158)
	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:155)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:156)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:160)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
Caused: java.lang.RuntimeException
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.doExec(ContainerExecDecorator.java:512)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator.access$1200(ContainerExecDecorator.java:73)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:390)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:246)
	at hudson.Launcher$ProcStarter.start(Launcher.java:455)
	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:206)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:305)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:268)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:176)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
	at sun.reflect.GeneratedMethodAccessor1014.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:158)
	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:155)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:156)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:160)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:130)
	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
	at WorkflowScript.run(WorkflowScript:37)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
	at sun.reflect.GeneratedMethodAccessor231.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$101(SandboxContinuable.java:34)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.lambda$run0$0(SandboxContinuable.java:59)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:136)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:58)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:182)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Finished: FAILURE

JNLP logs:

Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: maven-gpu-649e79cf-8c05-4450-bcd7-1503fd63f32a-czbvt-kwvzp
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 26, 2019 2:25:06 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 3.28
Mar 26, 2019 2:25:06 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://10.100.240.237/]
Mar 26, 2019 2:25:06 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Mar 26, 2019 2:25:06 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: jenkins-agent
  Agent port:    50000
  Identity:      f3:d5:10:ea:1b:5a:d5:30:86:1d:68:ff:30:14:fc:29
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins-agent:50000
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Mar 26, 2019 2:25:06 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: f3:d5:10:ea:1b:5a:d5:30:86:1d:68:ff:30:14:fc:29
Mar 26, 2019 2:25:07 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Mar 26, 2019 2:25:16 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled slave engine reconnects.
Mar 26, 2019 2:25:16 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated

 

me@ian.pw (JIRA)

unread,
Apr 8, 2019, 8:53:04 PM4/8/19
to jenkinsc...@googlegroups.com
Ian Macalinao commented on Bug JENKINS-56764
 
Re: Pipeline fails on some AMI images / node types because of Broken Pipe

I have this same issue as well after upgrading my cluster to the latest Amazon AMI's and EKS K8S 1.12.

alex@nederlof.com (JIRA)

unread,
Apr 9, 2019, 1:24:03 PM4/9/19
to jenkinsc...@googlegroups.com

I have it on EKS K8S 1.11, so I don't think that is related.

alex@nederlof.com (JIRA)

unread,
Apr 11, 2019, 4:22:02 AM4/11/19
to jenkinsc...@googlegroups.com

Ioannis Canellos is there anything I can do to move this issue forward? Because this is blocking our CI flow badly. If there's anything I can do to help please let me know

 

valikk@gmail.com (JIRA)

unread,
May 26, 2019, 6:25:02 PM5/26/19
to jenkinsc...@googlegroups.com

It seems to be failing because of nvidia-docker2 which is used by GPU AMIs

I tried several AMIs (amazon-eks-gpu-node-1.10 amazon-eks-gpu-node-1.11 amazon-eks-gpu-node-1.12) and same issue occured

Any ideas?

 

alex@nederlof.com (JIRA)

unread,
May 27, 2019, 8:00:02 AM5/27/19
to jenkinsc...@googlegroups.com

Interesting! I'll try to see if I can make some time to try different versions of Nvidia docker to pinpoint the issue. I'm glad we're not the only one facing this problem.

alex@nederlof.com (JIRA)

unread,
May 27, 2019, 8:00:03 AM5/27/19
to jenkinsc...@googlegroups.com
Alex Nederlof updated an issue
 
Change By: Alex Nederlof
Environment: Jenkins ver. 2.164.1, kubernetes plugin 0.2.3, Kubernetes 1. 11 12

alex@nederlof.com (JIRA)

unread,
Jun 27, 2019, 5:18:03 AM6/27/19
to jenkinsc...@googlegroups.com
 
Re: Pipeline fails on some AMI images / node types because of Broken Pipe

The issue persists on EKS with Kubernetes 1.13, which uses the latest nvidia-docker (nvidia-docker2.noarch 2.0.3-10.docker18.06.1ce.amzn2)

alex@nederlof.com (JIRA)

unread,
Jun 27, 2019, 5:19:02 AM6/27/19
to jenkinsc...@googlegroups.com
Alex Nederlof edited a comment on Bug JENKINS-56764
The issue persists on EKS with Kubernetes 1.13, which uses the latest nvidia-docker (nvidia-docker2.noarch 2.0.3-10.docker18.06.1ce.amzn2) , although the device plugin has not updated yet because NVidia still needs to release the v13 version

alexander.bachmeier@continental-corporation.com (JIRA)

unread,
Jan 10, 2020, 9:29:03 AM1/10/20
to jenkinsc...@googlegroups.com

We are also encountering this issue (EKS v1.14.9). 

Has anyone been able to make progress on this issue?

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

alex@nederlof.com (JIRA)

unread,
Jan 11, 2020, 11:41:03 PM1/11/20
to jenkinsc...@googlegroups.com

I'm also still stuck. Considering switching to another CI as we really need those GPU workloads.

 

Alexander Bachmeier a workaround for us is to use Kubernetes Jobs, combined with `kubectl wait` until the jobs complete, from a Jenkins job.

alexander.bachmeier@continental-corporation.com (JIRA)

unread,
Jan 15, 2020, 3:13:07 AM1/15/20
to jenkinsc...@googlegroups.com
Alexander Bachmeier updated an issue
 
Change By: Alexander Bachmeier
Environment: Jenkins ver. 2.164.1, kubernetes plugin 0.2.3, Kubernetes 1.12
Jenkins ver 2.190.1, Kubernetes plugin 1.21.2, EKS 1.14.9
Reply all
Reply to author
Forward
0 new messages