[JIRA] (JENKINS-53468) Since 1.27 update Channel is closing or has closed down

45 views
Skip to first unread message

john@keyba.se (JIRA)

unread,
Sep 7, 2018, 11:50:01 AM9/7/18
to jenkinsc...@googlegroups.com
John Zila created an issue
 
Jenkins / Bug JENKINS-53468
Since 1.27 update Channel is closing or has closed down
Issue Type: Bug Bug
Assignee: Ivan Fernandez Calvo
Components: ssh-slaves-plugin
Created: 2018-09-07 15:49
Priority: Critical Critical
Reporter: John Zila
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

john@keyba.se (JIRA)

unread,
Sep 7, 2018, 11:52:01 AM9/7/18
to jenkinsc...@googlegroups.com
John Zila updated an issue
Change By: John Zila
We are getting crashes of the form:
{noformat}
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on i-0893cee59e342e5da failed. The channel is closing down or has closed down
at hudson.remoting.Channel.call(Channel.java:948)
...{noformat}
Ever since our update to SSH Slaves 1.27 (we are now at 1.28.1). We're attempting to manually restore 1.26 to work around this.

kuisathaverat@gmail.com (JIRA)

unread,
Sep 7, 2018, 1:54:02 PM9/7/18
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

Which Jenkins core version do you use?
Which Od do you use on your SSH agents?
Which OpenSSH version do you have installed on your SSH agents?
Do it happen only on the SSH agents?
Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?
Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?
Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)?
Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?

kuisathaverat@gmail.com (JIRA)

unread,
Sep 7, 2018, 1:55:01 PM9/7/18
to jenkinsc...@googlegroups.com
Which Jenkins core version do you use?
Which Od OS do you use on your SSH agents?

Which OpenSSH version do you have installed on your SSH agents?
Do it happen only on the SSH agents?
Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?
Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?
Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)?
Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?
Do you see if it happens always with the same job or type of job?

kuisathaverat@gmail.com (JIRA)

unread,
Sep 7, 2018, 1:55:02 PM9/7/18
to jenkinsc...@googlegroups.com
Which Jenkins core version do you use?
Which Od do you use on your SSH agents?

Which OpenSSH version do you have installed on your SSH agents?
Do it happen only on the SSH agents?
Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?
Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?
Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)?
Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?
Do you see if it happens always with the same job or type of job?

john@keyba.se (JIRA)

unread,
Sep 11, 2018, 9:21:01 PM9/11/18
to jenkinsc...@googlegroups.com

Which Jenkins core version do you use?

Jenkins 2.141

Which OS do you use on your SSH agents?

Debian Jessie.

admin@ip-10-0-0-211:~$ uname -a
Linux ip-10-0-0-211 4.9.0-0.bpo.6-amd64 #1 SMP Debian 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux 

Which OpenSSH version do you have installed on your SSH agents?

admin@ip-10-0-0-211:~$ ssh -V
OpenSSH_6.7p1 Debian-5+deb8u4, OpenSSL 1.0.1t  3 May 2016 

Do it happen only on the SSH agents?

yes

Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?

randomly, most of them

Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?

Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?

No, because I'd have to switch my cluster to use the broken plugin

Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)? 

No, as above I'd have to switch my cluster to use the broken plugin

Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?

attached

config.xml

Do you see if it happens always with the same job or type of job?

any job

john@keyba.se (JIRA)

unread,
Sep 11, 2018, 9:21:02 PM9/11/18
to jenkinsc...@googlegroups.com

kuisathaverat@gmail.com (JIRA)

unread,
Sep 14, 2018, 1:51:02 PM9/14/18
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

you have configured 'c:\Jenkins' as working dir (remoteFS and fsRoot) in the agents, probably this path does not exist on a linux agent, I think that is your problem.

john@keyba.se (JIRA)

unread,
Sep 14, 2018, 1:56:02 PM9/14/18
to jenkinsc...@googlegroups.com

Oops I gave you the config for one of our Windows agents. Let me attach the config for one of our linux agents.

john@keyba.se (JIRA)

unread,
Sep 14, 2018, 2:08:03 PM9/14/18
to jenkinsc...@googlegroups.com

john@keyba.se (JIRA)

unread,
Sep 14, 2018, 2:08:03 PM9/14/18
to jenkinsc...@googlegroups.com
John Zila updated an issue
Change By: John Zila
Attachment: config_linux.xml

kuisathaverat@gmail.com (JIRA)

unread,
Sep 14, 2018, 2:25:02 PM9/14/18
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

ok, this one looks better
if agents work correctly for a while and then start to fail randomly, in that case, the issue could be related to JENKINS-49235 . There is a workaround https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall, in any case, I will test it on EC2 to see if I can replicate it.

kuisathaverat@gmail.com (JIRA)

unread,
Sep 14, 2018, 2:25:02 PM9/14/18
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo started work on Bug JENKINS-53468
 
Change By: Ivan Fernandez Calvo
Status: Open In Progress

lopez.sam@gmail.com (JIRA)

unread,
Nov 6, 2018, 10:54:01 AM11/6/18
to jenkinsc...@googlegroups.com

Hello Folks,

 

We are also observing this issue.

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 1:25:02 PM11/16/18
to jenkinsc...@googlegroups.com

FYI, things have been deteriorating badly. Nodes are now disconnecting even on 1.27 for no reason. I tried to upgrade to 1.28.1, but then this terrible bug reared its head, making Jenkins completely unusable. I've had to revert back to 1.27 to get back to a "just pretty bad" state.

Remoting version: 3.27
This is a Unix agent
Evacuated stdout
Agent successfully connected and online
Nov 16, 2018 5:18:27 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Nov 16, 2018 5:18:29 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Nov 16, 2018 5:19:38 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
ERROR: Connection terminated
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
ERROR: Socket connection to SSH server was lost
java.net.SocketTimeoutException: The connect timeout expired
	at com.trilead.ssh2.Connection$1.run(Connection.java:762)
	at com.trilead.ssh2.util.TimeoutService$TimeoutThread.run(TimeoutService.java:91)
Slave JVM has not reported exit code before the socket was lost
[11/16/18 17:20:40] [SSH] Connection closed. 

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 1:26:02 PM11/16/18
to jenkinsc...@googlegroups.com
John Zila edited a comment on Bug JENKINS-53468
FYI, things have been deteriorating badly. Nodes are now disconnecting even on 1. 27 26 for no reason. I tried to upgrade to 1.28.1, but then this terrible bug reared its head, making Jenkins completely unusable. I've had to revert back to 1. 27 26 to get back to a "just pretty bad" state.
{noformat}
[11/16/18 17:20:40] [SSH] Connection closed. {noformat}

kuisathaverat@gmail.com (JIRA)

unread,
Nov 16, 2018, 2:53:02 PM11/16/18
to jenkinsc...@googlegroups.com
I need the list of plugins installed is something weird,  run this script on the Jenkins script console and attach the output, it is the list of installed plugins and versions.

``` {code}
result = ''
for (plugin in Jenkins.instance.pluginManager.plugins) {
result = result + "${plugin.displayName}" + ',' + "${plugin.version}\n"
}
return result
``` {code}

kuisathaverat@gmail.com (JIRA)

unread,
Nov 16, 2018, 2:53:02 PM11/16/18
to jenkinsc...@googlegroups.com

I need the list of plugins installed is something weird, run this script on the Jenkins script console and attach the output, it is the list of installed plugins and versions.

```


result = ''
for (plugin in Jenkins.instance.pluginManager.plugins) {
result = result + "${plugin.displayName}" + ',' + "${plugin.version}\n"
}
return result
```

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 3:09:09 PM11/16/18
to jenkinsc...@googlegroups.com
HTML Publisher plugin,1.17
Credentials Plugin,2.1.18
Pipeline: Input Step,2.8
Jackson 2 API Plugin,2.9.7.1
Pipeline,2.6
Bitbucket Pipeline for Blue Ocean,1.9.0
Blue Ocean Core JS,1.9.0
Design Language,1.9.0
OWASP Markup Formatter Plugin,1.5
Maven Integration plugin,3.1.2
External Monitor Job Type Plugin,1.7
Pub-Sub "light" Bus,1.12
Pipeline: Declarative Agent API,1.1.1
Pipeline: Declarative Extension Points API,1.3.2
Multiple SCMs plugin,0.6
GitHub Pull Request Builder,1.42.0
Server Sent Events (SSE) Gateway Plugin,1.16
Web for Blue Ocean,1.9.0
Pipeline: Shared Groovy Libraries,2.12
Docker Pipeline,1.17
Common API for Blue Ocean,1.9.0
HSTS Filter Plugin,1.0
Folders Plugin,6.6
SCM API Plugin,2.3.0
AnsiColor,0.6.0
Command Agent Launcher Plugin,1.2
Events API for Blue Ocean,1.9.0
Structs Plugin,1.17
Pipeline: Nodes and Processes,2.26
Docker Commons Plugin,1.13
Pipeline Graph Analysis Plugin,1.9
Email Extension Plugin,2.63
Pipeline: Milestone Step,1.3.1
Lockable Resources plugin,2.3
Display URL for Blue Ocean,2.2.0
Pipeline: Build Step,2.7
Git client plugin,2.7.3
Variant Plugin,1.1
Config API for Blue Ocean,1.9.0
Mercurial plugin,2.4
Bitbucket Branch Source Plugin,2.2.14
MapDB API Plugin,1.0.9.0
Pipeline: API,2.32
GitHub Pipeline for Blue Ocean,1.9.0
Workspace Cleanup Plugin,0.36
JUnit Plugin,1.26.1
GitHub Authentication plugin,0.29
Pipeline SCM API for Blue Ocean,1.9.0
Green Balls,1.15
Pipeline: REST API Plugin,2.10
Pipeline: Basic Steps,2.12
Build Timeout,1.19
Run Condition Plugin,1.2
Matrix Authorization Strategy Plugin,2.3
SSH Credentials Plugin,1.14
Plain Credentials Plugin,1.4
Metrics Plugin,4.0.2.2
Pipeline: Groovy,2.60
Credentials Binding Plugin,1.17
Pipeline: SCM Step,2.7
Rebuilder,1.29
HTTP Request Plugin,1.8.22
Pipeline: GitHub Groovy Libraries,1.0
PAM Authentication plugin,1.4
REST Implementation for Blue Ocean,1.9.0
Display URL API,2.2.0
Pipeline: Declarative,1.3.2
Pipeline: Model API,1.3.2
Port Allocator Plug-in,1.8
Durable Task Plugin,1.28
bouncycastle API Plugin,2.17
Slack Notification Plugin,2.3
GIT server Plugin,1.7
Blue Ocean,1.9.0
JSch dependency plugin,0.1.54.2
Node Iterator API Plugin,1.5.0
JIRA Integration for Blue Ocean,1.9.0
i18n for Blue Ocean,1.9.0
Git plugin,3.9.1
Dashboard for Blue Ocean,1.9.0
Role-based Authorization Strategy,2.9.0
Matrix Project Plugin,1.13
Autofavorite for Blue Ocean,1.2.2
Pipeline Remote Loader Plugin,1.4
Conditional BuildStep,1.3.6
Pipeline: Stage View Plugin,2.10
promoted builds plugin,3.2
Blue Ocean Pipeline Editor,1.9.0
Resource Disposer Plugin,0.12
Copy Artifact Plugin,1.41
JavaScript GUI Lib: Moment.js bundle plugin,1.1.1
Git Pipeline for Blue Ocean,1.9.0
JIRA plugin,3.0.5
EC2 Fleet Jenkins Plugin,1.1.8-SNAPSHOT (private-cd808d0d-jzila)
Pipeline: Job,2.29
Script Security Plugin,1.48
JavaScript GUI Lib: Handlebars bundle plugin,1.1.1
REST API for Blue Ocean,1.9.0
Mask Passwords Plugin,2.12.0
Windows Slaves Plugin,1.3.1
Favorite,2.3.2
Pipeline: Stage Tags Metadata,1.3.2
Timestamper,1.8.10
Self-Organizing Swarm Plug-in Modules,3.14
Subversion Plug-in,2.12.1
GitHub Branch Source Plugin,2.4.1
LDAP Plugin,1.20
Pipeline: AWS Steps,1.33
SSH Slaves plugin,1.26
GitHub plugin,1.29.3
Monitoring,1.74.0
JWT for Blue Ocean,1.9.0
SSH Agent Plugin,1.17
Xvfb plugin,1.1.4-beta-1
Icon Shim Plugin,2.0.3
Authentication Tokens API Plugin,1.3
Token Macro Plugin,2.5
Apache HttpComponents Client 4.x API Plugin,4.5.5-3.0
CloudBees Amazon Web Services Credentials Plugin,1.23
JDK Tool Plugin,1.1
JavaScript GUI Lib: ACE Editor bundle plugin,1.1
Javadoc Plugin,1.4
Amazon Web Services SDK,1.11.403
Mailer Plugin,1.22
Parameterized Trigger plugin,2.35.2
GitHub API Plugin,1.92
Pipeline implementation for Blue Ocean,1.9.0
Pipeline: Stage Step,2.3
Ant Plugin,1.9
JavaScript GUI Lib: jQuery bundles (jQuery and jQuery UI) plugin,1.2.1
Pipeline: Multibranch,2.20
cross-platform shell plugin,0.10
Personalization for Blue Ocean,1.9.0
S3 publisher plugin,0.11.2
Branch API Plugin,2.0.21
GitHub Organization Folder Plugin,1.6
Pipeline: Step API,2.16
Pipeline: Supporting APIs,2.22
View Job Filters,2.1.1
Gradle Plugin,1.29
Handy Uri Templates 2.x API Plugin,2.1.6-1.0 

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 3:43:01 PM11/16/18
to jenkinsc...@googlegroups.com

I get this a bunch:

ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
java.util.concurrent.CancellationException
	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:883)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748) 

Re-launching the node almost always fixes it, but the plugin should be doing that automatically. Instead, I get nodes that fail to launch and then engineers complaining that CI is broken. I have to manually relaunch nodes dozens of times a day. Would be nice if the SSH Slaves Plugin retry settings worked.

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 3:44:03 PM11/16/18
to jenkinsc...@googlegroups.com

And this, on Windows nodes (usually 2-3 relaunches makes it work):

[11/16/18 20:42:35] [SSH] Checking java version of java
[11/16/18 20:42:51] [SSH] java -version returned 1.8.0_171.
[11/16/18 20:42:51] [SSH] Starting sftp client.
[11/16/18 20:42:52] [SSH] Copying latest slave.jar...
[11/16/18 20:42:53] [SSH] Copied 776,717 bytes.
Expanded the channel window size to 4MB
[11/16/18 20:42:53] [SSH] Starting slave process: cd "c:\Jenkins" && java -Xmx8192m -jar slave.jar
ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
java.util.concurrent.CancellationException
	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:883)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.io.IOException: java.io.InterruptedIOException
	at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:1120)
	at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:148)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:845)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:820)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.InterruptedIOException
	at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:938)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
	at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:409)
	at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:356)
	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:431)
	at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:1110)
	... 7 more

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 3:45:01 PM11/16/18
to jenkinsc...@googlegroups.com

And this:

[11/16/18 20:42:53] [SSH] Opening SSH connection to 10.0.2.84:22.
ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins

kuisathaverat@gmail.com (JIRA)

unread,
Nov 16, 2018, 4:58:03 PM11/16/18
to jenkinsc...@googlegroups.com

Did you apply the workaround on https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall ? there is a thread block with a high number of agents launched with Cloud plugins on 1.27+ (https://issues.jenkins-ci.org/browse/JENKINS-49235), the next version mitigate it a little but requires a change on the credentials plugin, so for the moment disabling the credentials track is the only solution.
Do you see the issue when you spin 10+/20+/30+/50+/100+/... agents?
When the issue happens can you get a threaddump? https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

john@keyba.se (JIRA)

unread,
Nov 16, 2018, 5:09:02 PM11/16/18
to jenkinsc...@googlegroups.com

Did you want a master thread dump? For obvious reasons, it'll be difficult to get an agent dump.

Re: the issue, it happens quite sporadically–I haven't noticed a correlation between the number of agents and the probability of the issue. Frankly it seems to happen almost every time a node initially attempts to start up. I changed the retry settings to keep trying again, but those seem to be ignored.

kuisathaverat@gmail.com (JIRA)

unread,
Nov 16, 2018, 5:29:02 PM11/16/18
to jenkinsc...@googlegroups.com

>Did you want a master thread dump?

yep, a master thread dump

>I changed the retry settings to keep trying again, but those seem to be ignored.

I do not talk about reties, I talk about disable credentials tracking, so set this property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` in Jenkins start options, it is available on 1.27+

john@keyba.se (JIRA)

unread,
Nov 19, 2018, 7:43:02 PM11/19/18
to jenkinsc...@googlegroups.com

I'm trying to test this but breaking changes keep stacking up for me: https://issues.jenkins-ci.org/browse/JENKINS-54686. I've disabled credentials tracking but I'll need to manually load 1.28.1 or wait for a version of SSH Slaves that has trilead-ssh2 restored.

glavoie@gmail.com (JIRA)

unread,
Nov 21, 2018, 9:15:03 AM11/21/18
to jenkinsc...@googlegroups.com

The connection timeout error seen here has been a recurrent issue with the ec2-fleet plugin for us: https://issues.jenkins-ci.org/browse/JENKINS-53468?focusedCommentId=354028&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-354028

Some debugging details are explained here: https://github.com/jenkinsci/ec2-fleet-plugin/issues/41

I tracked this down to the Connection.connect() method of trilead that doesn't clear up correctly the timeout handler, when the `kex` timeout is enabled and an exception occurs during the connection attempt. 

Created a PR about this: https://github.com/jenkinsci/trilead-ssh2/pull/36

glavoie@gmail.com (JIRA)

unread,
Nov 21, 2018, 9:16:04 AM11/21/18
to jenkinsc...@googlegroups.com
Gabriel Lavoie edited a comment on Bug JENKINS-53468
The connection timeout error seen here has been a recurrent issue with the ec2-fleet plugin for us: https://issues.jenkins-ci.org/browse/JENKINS-53468?focusedCommentId=354028&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-354028

Some debugging details are explained here: [https://github.com/jenkinsci/ec2-fleet-plugin/issues/41]

I tracked this down to the Connection.connect() method of _trilead_ that doesn't clear up correctly the timeout handler, when the `kex` timeout is enabled and an exception occurs during the connection attempt. 

Created a PR about this: 
[ https://github.com/jenkinsci/trilead-ssh2/pull/36 ]

Should I create a separate ticket for this?

john@keyba.se (JIRA)

unread,
Nov 21, 2018, 4:27:02 PM11/21/18
to jenkinsc...@googlegroups.com

Gabriel Lavoie it seems that they intend on switching from `trilead-ssh2` to `trilead-api`–will you need to implement this change there too?

glavoie@gmail.com (JIRA)

unread,
Nov 21, 2018, 4:34:03 PM11/21/18
to jenkinsc...@googlegroups.com

I wouldn't think so, if I look at https://github.com/jenkinsci/trilead-api-plugin/blob/93ea25242fea336bd46f0356c7a1c5d61fe34f2b/pom.xml#L72

Should I understand that the switch from `trilead-ssh2` to `trilead-api` would be to make that an optional dependency of Jenkins?

`ssh-slave-plugin` already depends on `trilead-api`. I had to bump the version in Jenkins core and rebuild `jenkins.war` to test the fix.

glavoie@gmail.com (JIRA)

unread,
Nov 21, 2018, 4:34:03 PM11/21/18
to jenkinsc...@googlegroups.com
Gabriel Lavoie edited a comment on Bug JENKINS-53468
I wouldn't think so, if I look at [https://github.com/jenkinsci/trilead-api-plugin/blob/93ea25242fea336bd46f0356c7a1c5d61fe34f2b/pom.xml#L72]

`trilead-api` still depends on `trilead-ssh2`.

Should I understand that the switch from `trilead-ssh2` to `trilead-api` would be to make that an optional dependency of Jenkins?

`ssh-slave-plugin` already depends on `trilead-api`. I had to bump the version in Jenkins core and rebuild `jenkins.war` to test the fix.

kuisathaverat@gmail.com (JIRA)

unread,
Nov 22, 2018, 4:00:02 AM11/22/18
to jenkinsc...@googlegroups.com

yep, the move from trilead-ssh2 to trilead-api is to be able to use a different trilead-ssh2 library than code. The reason to make it is to do not have to upgrade the core to change only a library, this change is similar a change made with the bouncycastle library. My mistake was not testing this change with the PCT and I have screwed up every plugin with the ssh-slaves dependency for a day and a half. Right now (1.29.1), ssh-slaves has the trilead-ssh2 dependency but still uses the trilead-ssh2 because I removed the option to mask core classes, so the next step it is to find every plugin that depends on it and make a PR to be able to make this change in next months, it's gonna be more longer than I like.

 

I also working on a branch 2.0.0 that introduces the concept of SSHProvider, it allows using different libraries or methods to make the SSH connection. The first alternative method will be using the native SSH client, it would improve performance and stability. Another important new feature is one to have several commands profiles, right now, the commands used are hardcoded and are only compatible with agents that have bash installed (even do it works on PowerShell and others), it makes that you can introduce a compatibility issue by modified a simple command, and stops to make improvements in others. By using the commands profiles you will select your command profile for your agent bash, Python, PowerShell, cmd, ... whatever you want.

 

glavoie@gmail.com (JIRA)

unread,
Nov 22, 2018, 8:04:02 AM11/22/18
to jenkinsc...@googlegroups.com

Ivan Fernandez Calvo thanks for the explanation. This said, how much time can I expect before my PR is reviewed and possibly integrated in a new version of the ssh-slaves plugin?

kuisathaverat@gmail.com (JIRA)

unread,
Nov 22, 2018, 8:22:02 AM11/22/18
to jenkinsc...@googlegroups.com

review and merged in the trilead-ssh probably this week (it depends on my spare time), integrated on trilead-api can be also this week, for the ssh-slaves will not be time soon because as I said it depends on the trilead-ssh core plugin, but if you are testing stuff I can release a beta version with the dependency of the trilead-api. It probably requires to patch some plugins to do not break the thing.

glavoie@gmail.com (JIRA)

unread,
Nov 22, 2018, 8:37:02 AM11/22/18
to jenkinsc...@googlegroups.com

I can wait until an official release. Thank you for the heads up!

john@keyba.se (JIRA)

unread,
Dec 7, 2018, 2:50:03 PM12/7/18
to jenkinsc...@googlegroups.com

OK I upgraded to 1.29.1 since the trilead-api fixes, and I've applied the credentials workaround. This issue is still happening, where nodes randomly disconnect during a build. I've run a thread dump but it isn't helpful since it only shows what is running as of the newest connection to the agent machine.

java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Also:   <cycle to java.io.IOException: Unexpected termination of the channel>
	Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on i-0f47993940c6bd3da failed. The channel is closing down or has closed down

Thread dump for that instance: jenkins_thread_dump.txt

john@keyba.se (JIRA)

unread,
Dec 7, 2018, 2:50:03 PM12/7/18
to jenkinsc...@googlegroups.com
John Zila updated an issue
 
Jenkins / Bug JENKINS-53468
Change By: John Zila
Attachment: jenkins_thread_dump.txt

cquchen@163.com (JIRA)

unread,
Dec 18, 2018, 12:43:03 AM12/18/18
to jenkinsc...@googlegroups.com
Jin Chen commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

Maybe the same issue for my case:

jenkins: 2.138.1
ssh-slave: 1.28/1.29.1
Openstack Cloud: 2.40

we use openstack-cloud plugin to start agent dynamically which connected by SSH, and sometimes the started "worker-0" is offline when running builds, but the node is there and we can relanuch the node to bring it online, hope the logs help you to trace the issue

// code placeholder

Error when executing always post condition:Error when executing always post condition: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on worker-0 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:948) at hudson.FilePath.act(FilePath.java:1071) at hudson.FilePath.act(FilePath.java:1060) at hudson.FilePath.mkdirs(FilePath.java:1245) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

cquchen@163.com (JIRA)

unread,
Dec 18, 2018, 12:43:06 AM12/18/18
to jenkinsc...@googlegroups.com
Jin Chen edited a comment on Bug JENKINS-53468
Maybe the same issue for my case:

jenkins: 2.138.1
ssh-slave: 1.28/1.29.1
Openstack Cloud: 2.40

we use openstack-cloud plugin to start agent dynamically which connected by SSH, and sometimes the started "worker-0" is offline when running builds, but the node is there and we can relanuch the node to bring it online, hope the logs help you to trace the issue
{code:java}
// code placeholder
{code}
Error when executing always post condition:Error when executing always post condition: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on worker-0 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:948) at hudson.FilePath.act(FilePath.java:1071) at hudson.FilePath.act(FilePath.java:1060) at hudson.FilePath.mkdirs(FilePath.java:1245) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

{code}
 

john@keyba.se (JIRA)

unread,
Jan 4, 2019, 7:57:02 PM1/4/19
to jenkinsc...@googlegroups.com

Any news here? I'm having to continue holding back my ssh-slaves plugin.

glavoie@gmail.com (JIRA)

unread,
May 3, 2019, 11:00:03 AM5/3/19
to jenkinsc...@googlegroups.com

Ivan Fernandez Calvo any update about releasing a new version of the ssh-slaves plug with this the latest trilead-ssh?

glavoie@gmail.com (JIRA)

unread,
May 3, 2019, 11:00:08 AM5/3/19
to jenkinsc...@googlegroups.com
Gabriel Lavoie edited a comment on Bug JENKINS-53468
[~ifernandezcalvo] any update about releasing a new version of the ssh-slaves plug plugin with this the latest trilead-ssh?

kuisathaverat@gmail.com (JIRA)

unread,
Jul 21, 2019, 8:51:36 AM7/21/19
to jenkinsc...@googlegroups.com

the original issue was resolved, see my comment https://issues.jenkins-ci.org/browse/JENKINS-53468?focusedCommentId=354417&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-354417 , any other issues are new and should need a new Jira to trace them, it is not possible to track several issues in the same Jira finally is a mess.

About the trilead-ssh, finally we rid it from core in 2.186 so now it is managed by the trilead-api-plugin, it would make easy updates of the library, roght now it uses the latest version of trileas-ssh2, it requires core 2.186 and ssh-slaves 1.30.1.

kuisathaverat@gmail.com (JIRA)

unread,
Jul 21, 2019, 8:52:02 AM7/21/19
to jenkinsc...@googlegroups.com

glavoie@gmail.com (JIRA)

unread,
Jul 21, 2019, 8:56:03 AM7/21/19
to jenkinsc...@googlegroups.com

kuisathaverat@gmail.com (JIRA)

unread,
Jul 21, 2019, 9:01:04 AM7/21/19
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo closed an issue as Fixed
 
Change By: Ivan Fernandez Calvo
Status: In Progress Closed
Resolution: Fixed

gopichand024@gmail.com (JIRA)

unread,
Aug 29, 2019, 7:26:02 AM8/29/19
to jenkinsc...@googlegroups.com
chand naidu commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

Hi Ivan Fernandez Calvo

 

the issue is seen after the upgrades as well! Any suggestions please? 

 

thanks 

gopichand024@gmail.com (JIRA)

unread,
Sep 2, 2019, 6:09:03 AM9/2/19
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages