[JIRA] (JENKINS-49707) Pipeline hangs: "The channel is closing down or has closed down"

2,273 views
Skip to first unread message

block.jon@gmail.com (JIRA)

unread,
Feb 23, 2018, 4:04:02 PM2/23/18
to jenkinsc...@googlegroups.com
Jon B updated an issue
 
Jenkins / Bug JENKINS-49707
Pipeline hangs: "The channel is closing down or has closed down"
Change By: Jon B
Summary: Pipeline stuck hangs : "The channel is closing down or has closed down"
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

block.jon@gmail.com (JIRA)

unread,
Feb 26, 2018, 9:56:01 PM2/26/18
to jenkinsc...@googlegroups.com
Jon B commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

Oleg Nenashev Should this be redesignated a remoting bug? I'm not sure how to unblock my pipelines that are hanging from this issue.

block.jon@gmail.com (JIRA)

unread,
Feb 26, 2018, 9:59:01 PM2/26/18
to jenkinsc...@googlegroups.com
Jon B updated an issue
Change By: Jon B
Component/s: remoting
Component/s: workflow-durable-task-step-plugin

block.jon@gmail.com (JIRA)

unread,
Feb 26, 2018, 10:01:02 PM2/26/18
to jenkinsc...@googlegroups.com
Jon B commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

i just changed the JIRA "component" field for this to "remoting".

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 27, 2018, 2:40:02 AM2/27/18
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 27, 2018, 2:40:02 AM2/27/18
to jenkinsc...@googlegroups.com
Oleg Nenashev commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

Please provide the following info:

You can find some pointers here: https://speakerdeck.com/onenashev/day-of-jenkins-2017-dealing-with-agent-connectivity-issues?slide=51

contact.if.urgent@gmail.com (JIRA)

unread,
Mar 15, 2018, 1:01:03 PM3/15/18
to jenkinsc...@googlegroups.com

I would like to increase the Priority of this issue to "Major" since this issue is affecting a lot of users.

slaughter550@gmail.com (JIRA)

unread,
Mar 29, 2018, 10:42:03 PM3/29/18
to jenkinsc...@googlegroups.com

slaughter550@gmail.com (JIRA)

unread,
Mar 29, 2018, 10:43:02 PM3/29/18
to jenkinsc...@googlegroups.com
Alex Slaughter commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

We have also been greatly effected by this issue. A resolution would be very nice

eduardo.lezcano@be.atlascopco.com (JIRA)

unread,
Jun 13, 2018, 3:39:03 AM6/13/18
to jenkinsc...@googlegroups.com

We are receiving this message sporadically in cloud nodes in Azure managed by Jenkins.

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:30:02 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum assigned an issue to Federico Naum
 
Change By: Federico Naum
Assignee: Federico Naum
This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:32:04 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jenkins_agent_devbuild9_remoting_logs.zip
Attachment: jenkins_agents_Thread_dump.html
Attachment: jenkins_Agent_devbuild9_System_Information.html
Attachment: jenkins_support_2018-06-29_01.14.18.zip

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:32:05 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jenkins_agents_Thread_dump.html

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:32:06 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jenkins_Agent_devbuild9_System_Information.html

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:32:07 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jenkins_support_2018-06-29_01.14.18.zip

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:37:04 PM6/28/18
to jenkinsc...@googlegroups.com

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:37:04 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jenkins_agents_Thread_dump.html

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:37:05 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jenkins_Agent_devbuild9_System_Information.html

federicon@al.com.au (JIRA)

unread,
Jun 28, 2018, 9:37:05 PM6/28/18
to jenkinsc...@googlegroups.com
Federico Naum commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

Hi, 

We are losing at a team to TeamCity mostly for this remoting issue

Here are the logs requested, the disconnection happened at 10:21 am (agent devbuild9)

jenkins_agent_devbuild9_remoting_logs.zip

jenkins_agents_Thread_dump.html

jenkins_Agent_devbuild9_System_Information.html

jenkins_support_2018-06-29_01.14.18.zip

I will appreciate any pointer, to where I can start looking for more information. Let me know if you need more logs. 

This is happening several times daily so I can provide more logs if needed

 

o.v.nenashev@gmail.com (JIRA)

unread,
Jun 29, 2018, 2:11:03 AM6/29/18
to jenkinsc...@googlegroups.com

At least we have some data for diagnostics now

federicon@al.com.au (JIRA)

unread,
Jul 4, 2018, 4:14:03 AM7/4/18
to jenkinsc...@googlegroups.com

This is a fresher issue, with fewer things going on, this time the agent that got disconnected is called grub

Job console output shows (jobConsoleOutput.txt) show at 17:27:54
 
 

hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on grub failed. The channel is closing down or has closed down  
  at hudson.remoting.Channel.call(Channel.java:948) 
  at hudson.FilePath.act(FilePath.java:1089) 
  at hudson.FilePath.act(FilePath.java:1078)  
   .....  
17:27:55 ERROR: Issue with creating launcher for agent grub. The agent has not been fully initialized yet

 
 
 jenkins master log at that time (jenkins.log) shows the following lines:
 

Jul 04, 2018 5:27:54 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel grub
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2328)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2797)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802)
        ... [trimmed stacktrace]

Jul 04, 2018 5:27:55 PM hudson.model.Slave reportLauncherCreateError
WARNING: Issue with creating launcher for agent grub. The agent has not been fully initialized yetProbably there is a race condition with Agent reconnection or disconnection, check other log entries
java.lang.IllegalStateException: No remoting channel to the agent OR it has not been fully initialized yet
at hudson.model.Slave.reportLauncherCreateError(Slave.java:524)
at hudson.model.Slave.createLauncher(Slave.java:496)
        ... [trimmed stacktrace]
 
Jul 04, 2018 5:27:55 PM hudson.model.Slave reportLauncherCreateError
WARNING: Issue with creating launcher for agent grub. The agent has not been fully initialized yetProbably there is a race condition with Agent reconnection or disconnection, check other log entries
java.lang.IllegalStateException: No remoting channel to the agent OR it has not been fully initialized yet
at hudson.model.Slave.reportLauncherCreateError(Slave.java:524)
at hudson.model.Slave.createLauncher(Slave.java:496)
        ... [trimmed stacktrace]
 
Jul 04, 2018 5:27:55 PM com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
INFO: ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Jul 04, 2018 5:27:55 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: rndtest_vortexLibrary/master #289 completed: ABORTED
  

 
The agent remoting log  that shows the error is the file created at 5:08 pm  (remoting.log.2 inside grub.remoting.logs.zip)
 
 

----

 


 
but it does not have a timestamp in the message. it would be handy to have one. because I can not work out if the agent or jenkins master initiated the disconnection.
 
I've also included 

  • The full support log (support_2018-07-04_07.35.22.zip)
  • The logs under ${JENKINS_HOME}/logs/slaves/grub (slaveLogInMaster.grub.zip)
  • Agent system Information that I grub just minutes after seeing the disconnection.
        - System Information (grubSystemInformation.html)
        - Heap Dump (JavaMelodyGrubHeapDump_4_07_18.pdf)
        - threads (JavaMelodyNodeGrubThreads_4_07_18.pdf)
        - (MonitoringJavaelodyOnNodes.html)
    * A screenshot (NetworkAndMachineStats.png) of the stats of the master (jenkinssecure1) and the agent (grub)  showing the netowrk activity, memory and cpu history. Hardly anything going on. 
     
     
     
     
     
     
     
     
     
     

federicon@al.com.au (JIRA)

unread,
Jul 4, 2018, 4:17:03 AM7/4/18
to jenkinsc...@googlegroups.com
Federico Naum edited a comment on Bug JENKINS-49707
This is a fresher issue, with fewer things going on, this time the agent that got disconnected is called *grub*

Job console output shows (*jobConsoleOutput.txt*) show at *17:27:54*
 
 
{code:java}

hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on grub failed. The channel is closing down or has closed down  
  at hudson.remoting.Channel.call(Channel.java:948)
  at hudson.FilePath.act(FilePath.java:1089)
  at hudson.FilePath.act(FilePath.java:1078)  
   .....  
17:27:55 ERROR: Issue with creating launcher for agent grub. The agent has not been fully initialized yet{code}
 
 
 jenkins master log at that time (*jenkins.log*) shows the following lines:
 
{code:java}
  {code}
 
The agent remoting log  that shows the error is the file created at *5:08 pm*  (*remoting.log.2* inside *grub.remoting.logs.zip*)
 
 
{code:java}
---- at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2675)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3150)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:859)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:355)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

 
{code}
 
----
 
but it does not have a timestamp in the message. it would be handy to have one. because I can not work out if the agent or
jenkins Jenkins master initiated the disconnection.
 
I've also included 
* The full support log (
* support_2018-07-04_07.35.22.zip * )
* The logs under ${JENKINS_HOME}/logs/slaves/grub (
* slaveLogInMaster.grub.zip * )
* Agent system Information that I grub just minutes after seeing the disconnection.
    - System Information (
* grubSystemInformation.html * )
    - Heap Dump (
* JavaMelodyGrubHeapDump_4_07_18.pdf * )
    - threads (
* JavaMelodyNodeGrubThreads_4_07_18.pdf * )
    - (
* MonitoringJavaelodyOnNodes.html * )
* A screenshot (
* NetworkAndMachineStats.png) *  of the stats of the master (jenkinssecure1) and the agent (grub)  showing the netowrk network  activity, memory and  cpu CPU history. - Hardly anything going on both machines-
 
 
 
 
 
 
 
 
 
 

federicon@al.com.au (JIRA)

unread,
Jul 4, 2018, 4:18:02 AM7/4/18
to jenkinsc...@googlegroups.com
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2675)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3150)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:859)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:355)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

 {code}
 
----
 
but it does not have a timestamp in the message. it would be handy to have one. because I can not work out if the agent or Jenkins master initiated the disconnection.

 
I've also included 
* The full support log (*support_2018-07-04_07.35.22.zip*)
* The logs under ${JENKINS_HOME}/logs/slaves/grub (*slaveLogInMaster.grub.zip*)
* Agent system Information that I grub just minutes after seeing the disconnection.
    - System Information (*grubSystemInformation.html*)
    - Heap Dump (*JavaMelodyGrubHeapDump_4_07_18.pdf*)
    - threads (*JavaMelodyNodeGrubThreads_4_07_18.pdf*)
    - (*MonitoringJavaelodyOnNodes.html*)
* A screenshot (*NetworkAndMachineStats.png)* of the stats of the master (jenkinssecure1) and the agent (grub)  showing the network activity, memory and CPU history. - Hardly anything going on both machines -
 
 
 
 
 
 
 
 
 
 

federicon@al.com.au (JIRA)

unread,
Jul 4, 2018, 4:19:02 AM7/4/18
to jenkinsc...@googlegroups.com

federicon@al.com.au (JIRA)

unread,
Jul 4, 2018, 4:20:02 AM7/4/18
to jenkinsc...@googlegroups.com

federicon@al.com.au (JIRA)

unread,
Jul 4, 2018, 4:20:03 AM7/4/18
to jenkinsc...@googlegroups.com
Federico Naum updated an issue
Change By: Federico Naum
Attachment: jobConsoleOutput.txt
Attachment: grub.remoting.logs.zip
Attachment: NetworkAndMachineStats.png
Attachment: JavaMelodyGrubHeapDump_4_07_18.pdf
Attachment: JavaMelodyNodeGrubThreads_4_07_18.pdf
Attachment: MonitoringJavaelodyOnNodes.html
Attachment: grubSystemInformation.html
Attachment: Thread dump [Jenkins].html
Attachment: support_2018-07-04_07.35.22.zip
Attachment: slaveLogInMaster.grub.zip
Attachment: jenkins.log

tom.ghyselinck@excentis.com (JIRA)

unread,
Jul 23, 2018, 5:43:04 AM7/23/18
to jenkinsc...@googlegroups.com

tom.ghyselinck@excentis.com (JIRA)

unread,
Jul 23, 2018, 5:59:02 AM7/23/18
to jenkinsc...@googlegroups.com
Tom Ghyselinck commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

Hi Oleg Nenashev,

Do you have any update on this?

We have seen similar issues: The Jenkins Pipeline hangs when the node becomes unreachable at some point in time.

It would be great to see this fixed. This issue sometimes blocks many jobs in the queue of our CI.

In this case it was an intermittent networking issue:

19:35:55 [Sat Jul 14 17:35:54 2018] Waiting for impl_1 to finish...
19:37:32 /opt/Xilinx/Vivado/2018.1/bin/loader: line 194:  4860 Killed                  "$RDI_PROG" "$@"
19:37:32 Makefile:423: recipe for target '../../work/projects/dev1/dev1.sdk/dev1.hdf' failed
19:37:32 make: *** [../../work/projects/dev1/dev1.sdk/dev1.hdf] Error 137
19:37:32 make: *** Waiting for unfinished jobs....
19:38:06 /opt/Xilinx/Vivado/2018.1/bin/loader: line 194:  4859 Killed                  "$RDI_PROG" "$@"
19:38:06 Makefile:423: recipe for target '../../work/projects/dev0/dev0.sdk/dev0.hdf' failed
19:38:06 make: *** [../../work/projects/dev0/dev0.sdk/dev0.hdf] Error 137
19:39:59 Cannot contact ubuntu-16-04-amd64-2: java.io.IOException: remote file operation failed: /var/jenkins/ubuntu-16-04-amd64/workspace/nts-fpga_branches_1.x-bwreq-YHXXBDM77DWWMJ4IUYUNRNT2YKWDIASW4VY4YNK2ULEAULWQGGJA/build/fpga-projects/build at hudson.remoting.Channel@cf0f3fa:ubuntu-16-04-amd64-2: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ubuntu-16-04-amd64-2 failed. The channel is closing down or has closed down
19:40:30 /opt/Xilinx/Vivado/2018.1/bin/loader: line 194:  4861 Killed                  "$RDI_PROG" "$@"
19:40:30 Makefile:423: recipe for target '../../work/projects/dev2/dev2.sdk/dev2.hdf' failed
19:40:30 make: *** [../../work/projects/dev2/dev2.sdk/dev2.hdf] Error 137

finally, we aborted the build:

Aborted by me
09:41:29 Sending interrupt signal to process
09:41:39 After 10s process did not stop

Please note that in the post steps, we see the errors occur but the build no longer hangs here:

Error when executing always post condition:
java.io.IOException: remote file operation failed: /var/jenkins/ubuntu-16-04-amd64/workspace/nts-fpga_branches_1.x-bwreq-YHXXBDM77DWWMJ4IUYUNRNT2YKWDIASW4VY4YNK2ULEAULWQGGJA/packages at hudson.remoting.Channel@cf0f3fa:ubuntu-16-04-amd64-2: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ubuntu-16-04-amd64-2 failed. The channel is closing down or has closed down
	at hudson.FilePath.act(FilePath.java:1043)
	at hudson.FilePath.act(FilePath.java:1025)
	at hudson.FilePath.mkdirs(FilePath.java:1213)
	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79)
	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50)
	at hudson.security.ACL.impersonate(ACL.java:290)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ubuntu-16-04-amd64-2 failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:948)
	at hudson.FilePath.act(FilePath.java:1036)
	... 12 more
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2679)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3154)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

[Pipeline] cleanWs
Error when executing cleanup post condition:
java.io.IOException: remote file operation failed: /var/jenkins/ubuntu-16-04-amd64/workspace/nts-fpga_branches_1.x-bwreq-YHXXBDM77DWWMJ4IUYUNRNT2YKWDIASW4VY4YNK2ULEAULWQGGJA at hudson.remoting.Channel@cf0f3fa:ubuntu-16-04-amd64-2: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ubuntu-16-04-amd64-2 failed. The channel is closing down or has closed down
	at hudson.FilePath.act(FilePath.java:1043)
	at hudson.FilePath.act(FilePath.java:1025)
	at hudson.FilePath.mkdirs(FilePath.java:1213)
	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79)
	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50)
	at hudson.security.ACL.impersonate(ACL.java:290)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ubuntu-16-04-amd64-2 failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:948)
	at hudson.FilePath.act(FilePath.java:1036)
	... 12 more
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2679)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3154)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

I hope this somewhat helps.

With best regards,
Tom.

o.v.nenashev@gmail.com (JIRA)

unread,
Jul 23, 2018, 6:05:03 AM7/23/18
to jenkinsc...@googlegroups.com

Tom Ghyselinck nope, I don't. I have requested info which is needed to diagnose the issue, but I have never reviewed it. I will unlikely have time for that in short-term, busy with other stuff in the community. Jeff Thompson is the current Remoting default assignee, so I will assign the issue to him.

 

o.v.nenashev@gmail.com (JIRA)

unread,
Jul 23, 2018, 6:05:03 AM7/23/18
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Jul 23, 2018, 6:05:04 AM7/23/18
to jenkinsc...@googlegroups.com

tom.ghyselinck@excentis.com (JIRA)

unread,
Jul 23, 2018, 6:10:02 AM7/23/18
to jenkinsc...@googlegroups.com
Tom Ghyselinck commented on Bug JENKINS-49707
 
Re: Pipeline hangs: "The channel is closing down or has closed down"

Hi Oleg Nenashev,

Thanks!

P.S. I set the Assignee to "Automatic" and it assigned you, it probably needs a change in the component configuration to set it to Jeff Thompson by default?

With best regards,
Tom.

o.v.nenashev@gmail.com (JIRA)

unread,
Jul 23, 2018, 6:13:02 AM7/23/18
to jenkinsc...@googlegroups.com

No, I am just a default assignee of the "_unsorted" component which was first in the component list. "remoting" component is configured properly, and I have just removed "_unsorted" for now since we have got the diagnostics info

federicon@al.com.au (JIRA)

unread,
Aug 3, 2018, 2:36:01 AM8/3/18
to jenkinsc...@googlegroups.com

Has someone with experiencing this issue had a look at this new plugin https://plugins.jenkins.io/remoting-kafka

Oleg Nenashev I can see you are very involved with it.

Looks promising, is lacking some documentation, but I'll play with to see if I can get it working, and report back to see If that solve my connection issues.

block.jon@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:38:03 AM9/17/18
to jenkinsc...@googlegroups.com
Jon B commented on Bug JENKINS-49707

The repro case here is pretty simple:

1) Create a parallel job (even a job that just does a sleep)

2) Terminate the executor's host while its running

It hangs with this error every time.

 

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

block.jon@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:39:02 AM9/17/18
to jenkinsc...@googlegroups.com
Jon B commented on Bug JENKINS-49707

I don't mean to be dramatic but this is literally the biggest problem in all of Jenkins as far as I can tell. If we lose an ec2 host while an executor is doing parallel work, we badly need for the parallel item to restart on a healthy node. When it just plain hangs, we can't do that and the user experience of hanging is not acceptable.

I would recommend elevating the urgency here to the highest level to get this triaged.

block.jon@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:40:03 AM9/17/18
to jenkinsc...@googlegroups.com
Jon B edited a comment on Bug JENKINS-49707
I don't mean to be dramatic but this is literally the biggest problem in all of Jenkins as far as I can tell. If we lose an ec2 host while an executor is doing parallel work, we *badly need* for the parallel item to restart on a another healthy node executor . When it just plain hangs, we can't do that and the user experience of hanging is not acceptable.


I would recommend elevating the urgency here to the highest level to get this triaged.

block.jon@gmail.com (JIRA)

unread,
Sep 17, 2018, 3:28:06 AM9/17/18
to jenkinsc...@googlegroups.com
Jon B edited a comment on Bug JENKINS-49707
Repro code:
{code:java}
def jobs = [:]
jobs["Do Work"] = getWork()
parallel jobs
println "Parallel run completed."

def getWork() {
  return {
    node('general') {
      sh """|#!/bin/bash
            |set -ex
            |echo "going to sleep..."
            |sleep 300
            |echo "yay I made it to the end."
            |""".stripMargin()
    }
  }
}
{code}
To repro, run this pipeline and once the control flow hits the sleep, terminate the executor's host and it will hang with something like this:
{code:java}
[Do Work] Cannot contact ip-172-31-237-68.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-237-68.us-west-2.compute.internal failed. The channel is closing down or has closed down {code}
It
fails hangs with this error every time I try it.

block.jon@gmail.com (JIRA)

unread,
Sep 17, 2018, 3:28:06 AM9/17/18
to jenkinsc...@googlegroups.com
Jon B commented on Bug JENKINS-49707

Repro code:

def jobs = [:]
jobs["Do Work"] = getWork()
parallel jobs
println "Parallel run completed."

def getWork() {
  return {
    node('general') {
      sh """|#!/bin/bash
            |set -ex
            |echo "going to sleep..."
            |sleep 300
            |echo "yay I made it to the end."
            |""".stripMargin()
    }
  }
}
 
 
                                                            

To repro, run this pipeline and once the control flow hits the sleep, terminate the executor's host and it will hang with something like this:

[Do Work] Cannot contact ip-172-31-237-68.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-237-68.us-west-2.compute.internal failed. The channel is closing down or has closed down 

It fails with this error every time I try it.

michael@redengine.co.nz (JIRA)

unread,
Sep 17, 2018, 7:16:02 AM9/17/18
to jenkinsc...@googlegroups.com

I concur this is a pretty serious issue, I've tried a number or workarounds  like timeouts to restart the job but once it hangs its stuck.

mgreco2k@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:24:05 PM9/17/18
to jenkinsc...@googlegroups.com

I've been noticing this for MONTHS. And in case people don't realize the master branch of the docker-plugin wasn't building today 9/17/18 :

https://ci.jenkins.io/job/Plugins/job/docker-plugin/

 

Anyways this weekend I loaded docker-plugin build 1.1.5 and today on every build I was getting "The channel is closing down or has closed down" as my jobs would still appear to be running even though obviously the container was gone.

 

I would up downgrading to an older build I have :

  

1.1.5-SNAPSHOT (private-554bbf8a-win2012-6d34b0$)

 

in which the problem seems to happen less. I went so far as to rebuild some of my "build containers" as they are created "FROM jenkinsci/slave" and I noticed that has had an update sometime in August.

 

Again It made no difference using the "released 1.1.5" version of docker-plugin (every thing wound up in the state of "The channel is closing down or has closed down") and that's when I noticed the master branch isn't building either ... so I just went back to my earlier build.

mgreco2k@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:25:02 PM9/17/18
to jenkinsc...@googlegroups.com
Michael Greco edited a comment on Bug JENKINS-49707

block.jon@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:27:01 PM9/17/18
to jenkinsc...@googlegroups.com
Jon B commented on Bug JENKINS-49707

If left unfixed for much longer, our organization is going to be forced to use another technology for CICD since this is causing widespread pain and confidence lost in this technology among our hundreds of developers who are using Jenkins at our company.

mgreco2k@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:37:04 PM9/17/18
to jenkinsc...@googlegroups.com

Maybe try the LTS? ... uggg I try to be a "start-up" kind of guy ... it sounds like there's maybe some integration tests that need to be part of the project ...

mgreco2k@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:39:02 PM9/17/18
to jenkinsc...@googlegroups.com
Michael Greco edited a comment on Bug JENKINS-49707
Maybe try the LTS? ... uggg I try to be a "start-up" kind of guy ... it sounds like there's maybe some integration tests that need to be part of the project ... If you got access to spinning up another VM maybe launch the LTS version and try it out. I know I keep the jenkins data in a docker volume so moving around between these versions to try stuff out on different docker hosts for exactly these situations is helpful.

mgreco2k@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:40:04 PM9/17/18
to jenkinsc...@googlegroups.com
Michael Greco edited a comment on Bug JENKINS-49707
Maybe try the LTS? ... uggg I try to be a "start-up" kind of guy ... it sounds like there's maybe some integration tests that need to be part of the project ... If you got access to spinning up another VM maybe launch the LTS version and try it out. I know I keep the jenkins data in a docker volume so moving around between these versions to try stuff out on different docker hosts for exactly these situations is helpful. I'm running 2.140 but maybe the plugin works better with the LTS (ok I'm reaching out side the box).

mgreco2k@gmail.com (JIRA)

unread,
Sep 17, 2018, 2:40:04 PM9/17/18
to jenkinsc...@googlegroups.com
Michael Greco edited a comment on Bug JENKINS-49707
Maybe try the LTS? ... uggg I try to be a "start-up" kind of guy ... it sounds like there's maybe some integration tests that need to be part of the project ... If you got access to spinning up another VM maybe launch the LTS version and try it out. I know I keep the jenkins data in a docker volume so moving around between these versions to try stuff out on different docker hosts for exactly these situations is helpful. I'm running 2.140 but maybe the plugin works better with the LTS ? (ok I'm reaching out side the box ) cause if the plugin has a bug then . ..)
Reply all
Reply to author
Forward
0 new messages