[JIRA] (JENKINS-53468) Since 1.27 update Channel is closing or has closed down

45 visninger
Gå til det første ulæste opslag

john@keyba.se (JIRA)

ulæst,
7. sep. 2018, 11.50.0107.09.2018
til jenkinsc...@googlegroups.com
John Zila created an issue
 
Jenkins / Bug JENKINS-53468
Since 1.27 update Channel is closing or has closed down
Issue Type: Bug Bug
Assignee: Ivan Fernandez Calvo
Components: ssh-slaves-plugin
Created: 2018-09-07 15:49
Priority: Critical Critical
Reporter: John Zila
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

john@keyba.se (JIRA)

ulæst,
7. sep. 2018, 11.52.0107.09.2018
til jenkinsc...@googlegroups.com
John Zila updated an issue
Change By: John Zila
We are getting crashes of the form:
{noformat}
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on i-0893cee59e342e5da failed. The channel is closing down or has closed down
at hudson.remoting.Channel.call(Channel.java:948)
...{noformat}
Ever since our update to SSH Slaves 1.27 (we are now at 1.28.1). We're attempting to manually restore 1.26 to work around this.

kuisathaverat@gmail.com (JIRA)

ulæst,
7. sep. 2018, 13.54.0207.09.2018
til jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

Which Jenkins core version do you use?
Which Od do you use on your SSH agents?
Which OpenSSH version do you have installed on your SSH agents?
Do it happen only on the SSH agents?
Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?
Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?
Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)?
Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?

kuisathaverat@gmail.com (JIRA)

ulæst,
7. sep. 2018, 13.55.0107.09.2018
til jenkinsc...@googlegroups.com
Which Jenkins core version do you use?
Which Od OS do you use on your SSH agents?

Which OpenSSH version do you have installed on your SSH agents?
Do it happen only on the SSH agents?
Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?
Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?
Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)?
Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?
Do you see if it happens always with the same job or type of job?

kuisathaverat@gmail.com (JIRA)

ulæst,
7. sep. 2018, 13.55.0207.09.2018
til jenkinsc...@googlegroups.com
Which Jenkins core version do you use?
Which Od do you use on your SSH agents?

Which OpenSSH version do you have installed on your SSH agents?
Do it happen only on the SSH agents?
Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?
Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?
Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)?
Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?
Do you see if it happens always with the same job or type of job?

john@keyba.se (JIRA)

ulæst,
11. sep. 2018, 21.21.0111.09.2018
til jenkinsc...@googlegroups.com

Which Jenkins core version do you use?

Jenkins 2.141

Which OS do you use on your SSH agents?

Debian Jessie.

admin@ip-10-0-0-211:~$ uname -a
Linux ip-10-0-0-211 4.9.0-0.bpo.6-amd64 #1 SMP Debian 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux 

Which OpenSSH version do you have installed on your SSH agents?

admin@ip-10-0-0-211:~$ ssh -V
OpenSSH_6.7p1 Debian-5+deb8u4, OpenSSL 1.0.1t  3 May 2016 

Do it happen only on the SSH agents?

yes

Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?

randomly, most of them

Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, ...)?

Could you attach the agent connection log (JENKINS_URL/computer/NODENAME/log)?

No, because I'd have to switch my cluster to use the broken plugin

Could you attach the logs inside the remoting folder (see https://github.com/jenkinsci/remoting/blob/master/docs/workDir.md#remoting-work-directory)? 

No, as above I'd have to switch my cluster to use the broken plugin

Could you attach the agent configuration (JENKINS_URL/computer/NODENAME/config.xml) file?

attached

config.xml

Do you see if it happens always with the same job or type of job?

any job

john@keyba.se (JIRA)

ulæst,
11. sep. 2018, 21.21.0211.09.2018
til jenkinsc...@googlegroups.com

kuisathaverat@gmail.com (JIRA)

ulæst,
14. sep. 2018, 13.51.0214.09.2018
til jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

you have configured 'c:\Jenkins' as working dir (remoteFS and fsRoot) in the agents, probably this path does not exist on a linux agent, I think that is your problem.

john@keyba.se (JIRA)

ulæst,
14. sep. 2018, 13.56.0214.09.2018
til jenkinsc...@googlegroups.com

Oops I gave you the config for one of our Windows agents. Let me attach the config for one of our linux agents.

john@keyba.se (JIRA)

ulæst,
14. sep. 2018, 14.08.0314.09.2018
til jenkinsc...@googlegroups.com

john@keyba.se (JIRA)

ulæst,
14. sep. 2018, 14.08.0314.09.2018
til jenkinsc...@googlegroups.com
John Zila updated an issue
Change By: John Zila
Attachment: config_linux.xml

kuisathaverat@gmail.com (JIRA)

ulæst,
14. sep. 2018, 14.25.0214.09.2018
til jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

ok, this one looks better
if agents work correctly for a while and then start to fail randomly, in that case, the issue could be related to JENKINS-49235 . There is a workaround https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall, in any case, I will test it on EC2 to see if I can replicate it.

kuisathaverat@gmail.com (JIRA)

ulæst,
14. sep. 2018, 14.25.0214.09.2018
til jenkinsc...@googlegroups.com
Ivan Fernandez Calvo started work on Bug JENKINS-53468
 
Change By: Ivan Fernandez Calvo
Status: Open In Progress

lopez.sam@gmail.com (JIRA)

ulæst,
6. nov. 2018, 10.54.0106.11.2018
til jenkinsc...@googlegroups.com

Hello Folks,

 

We are also observing this issue.

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 13.25.0216.11.2018
til jenkinsc...@googlegroups.com

FYI, things have been deteriorating badly. Nodes are now disconnecting even on 1.27 for no reason. I tried to upgrade to 1.28.1, but then this terrible bug reared its head, making Jenkins completely unusable. I've had to revert back to 1.27 to get back to a "just pretty bad" state.

Remoting version: 3.27
This is a Unix agent
Evacuated stdout
Agent successfully connected and online
Nov 16, 2018 5:18:27 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Nov 16, 2018 5:18:29 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Nov 16, 2018 5:19:38 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
ERROR: Connection terminated
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
ERROR: Socket connection to SSH server was lost
java.net.SocketTimeoutException: The connect timeout expired
	at com.trilead.ssh2.Connection$1.run(Connection.java:762)
	at com.trilead.ssh2.util.TimeoutService$TimeoutThread.run(TimeoutService.java:91)
Slave JVM has not reported exit code before the socket was lost
[11/16/18 17:20:40] [SSH] Connection closed. 

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 13.26.0216.11.2018
til jenkinsc...@googlegroups.com
John Zila edited a comment on Bug JENKINS-53468
FYI, things have been deteriorating badly. Nodes are now disconnecting even on 1. 27 26 for no reason. I tried to upgrade to 1.28.1, but then this terrible bug reared its head, making Jenkins completely unusable. I've had to revert back to 1. 27 26 to get back to a "just pretty bad" state.
{noformat}
[11/16/18 17:20:40] [SSH] Connection closed. {noformat}

kuisathaverat@gmail.com (JIRA)

ulæst,
16. nov. 2018, 14.53.0216.11.2018
til jenkinsc...@googlegroups.com
I need the list of plugins installed is something weird,  run this script on the Jenkins script console and attach the output, it is the list of installed plugins and versions.

``` {code}
result = ''
for (plugin in Jenkins.instance.pluginManager.plugins) {
result = result + "${plugin.displayName}" + ',' + "${plugin.version}\n"
}
return result
``` {code}

kuisathaverat@gmail.com (JIRA)

ulæst,
16. nov. 2018, 14.53.0216.11.2018
til jenkinsc...@googlegroups.com

I need the list of plugins installed is something weird, run this script on the Jenkins script console and attach the output, it is the list of installed plugins and versions.

```


result = ''
for (plugin in Jenkins.instance.pluginManager.plugins) {
result = result + "${plugin.displayName}" + ',' + "${plugin.version}\n"
}
return result
```

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 15.09.0916.11.2018
til jenkinsc...@googlegroups.com
HTML Publisher plugin,1.17
Credentials Plugin,2.1.18
Pipeline: Input Step,2.8
Jackson 2 API Plugin,2.9.7.1
Pipeline,2.6
Bitbucket Pipeline for Blue Ocean,1.9.0
Blue Ocean Core JS,1.9.0
Design Language,1.9.0
OWASP Markup Formatter Plugin,1.5
Maven Integration plugin,3.1.2
External Monitor Job Type Plugin,1.7
Pub-Sub "light" Bus,1.12
Pipeline: Declarative Agent API,1.1.1
Pipeline: Declarative Extension Points API,1.3.2
Multiple SCMs plugin,0.6
GitHub Pull Request Builder,1.42.0
Server Sent Events (SSE) Gateway Plugin,1.16
Web for Blue Ocean,1.9.0
Pipeline: Shared Groovy Libraries,2.12
Docker Pipeline,1.17
Common API for Blue Ocean,1.9.0
HSTS Filter Plugin,1.0
Folders Plugin,6.6
SCM API Plugin,2.3.0
AnsiColor,0.6.0
Command Agent Launcher Plugin,1.2
Events API for Blue Ocean,1.9.0
Structs Plugin,1.17
Pipeline: Nodes and Processes,2.26
Docker Commons Plugin,1.13
Pipeline Graph Analysis Plugin,1.9
Email Extension Plugin,2.63
Pipeline: Milestone Step,1.3.1
Lockable Resources plugin,2.3
Display URL for Blue Ocean,2.2.0
Pipeline: Build Step,2.7
Git client plugin,2.7.3
Variant Plugin,1.1
Config API for Blue Ocean,1.9.0
Mercurial plugin,2.4
Bitbucket Branch Source Plugin,2.2.14
MapDB API Plugin,1.0.9.0
Pipeline: API,2.32
GitHub Pipeline for Blue Ocean,1.9.0
Workspace Cleanup Plugin,0.36
JUnit Plugin,1.26.1
GitHub Authentication plugin,0.29
Pipeline SCM API for Blue Ocean,1.9.0
Green Balls,1.15
Pipeline: REST API Plugin,2.10
Pipeline: Basic Steps,2.12
Build Timeout,1.19
Run Condition Plugin,1.2
Matrix Authorization Strategy Plugin,2.3
SSH Credentials Plugin,1.14
Plain Credentials Plugin,1.4
Metrics Plugin,4.0.2.2
Pipeline: Groovy,2.60
Credentials Binding Plugin,1.17
Pipeline: SCM Step,2.7
Rebuilder,1.29
HTTP Request Plugin,1.8.22
Pipeline: GitHub Groovy Libraries,1.0
PAM Authentication plugin,1.4
REST Implementation for Blue Ocean,1.9.0
Display URL API,2.2.0
Pipeline: Declarative,1.3.2
Pipeline: Model API,1.3.2
Port Allocator Plug-in,1.8
Durable Task Plugin,1.28
bouncycastle API Plugin,2.17
Slack Notification Plugin,2.3
GIT server Plugin,1.7
Blue Ocean,1.9.0
JSch dependency plugin,0.1.54.2
Node Iterator API Plugin,1.5.0
JIRA Integration for Blue Ocean,1.9.0
i18n for Blue Ocean,1.9.0
Git plugin,3.9.1
Dashboard for Blue Ocean,1.9.0
Role-based Authorization Strategy,2.9.0
Matrix Project Plugin,1.13
Autofavorite for Blue Ocean,1.2.2
Pipeline Remote Loader Plugin,1.4
Conditional BuildStep,1.3.6
Pipeline: Stage View Plugin,2.10
promoted builds plugin,3.2
Blue Ocean Pipeline Editor,1.9.0
Resource Disposer Plugin,0.12
Copy Artifact Plugin,1.41
JavaScript GUI Lib: Moment.js bundle plugin,1.1.1
Git Pipeline for Blue Ocean,1.9.0
JIRA plugin,3.0.5
EC2 Fleet Jenkins Plugin,1.1.8-SNAPSHOT (private-cd808d0d-jzila)
Pipeline: Job,2.29
Script Security Plugin,1.48
JavaScript GUI Lib: Handlebars bundle plugin,1.1.1
REST API for Blue Ocean,1.9.0
Mask Passwords Plugin,2.12.0
Windows Slaves Plugin,1.3.1
Favorite,2.3.2
Pipeline: Stage Tags Metadata,1.3.2
Timestamper,1.8.10
Self-Organizing Swarm Plug-in Modules,3.14
Subversion Plug-in,2.12.1
GitHub Branch Source Plugin,2.4.1
LDAP Plugin,1.20
Pipeline: AWS Steps,1.33
SSH Slaves plugin,1.26
GitHub plugin,1.29.3
Monitoring,1.74.0
JWT for Blue Ocean,1.9.0
SSH Agent Plugin,1.17
Xvfb plugin,1.1.4-beta-1
Icon Shim Plugin,2.0.3
Authentication Tokens API Plugin,1.3
Token Macro Plugin,2.5
Apache HttpComponents Client 4.x API Plugin,4.5.5-3.0
CloudBees Amazon Web Services Credentials Plugin,1.23
JDK Tool Plugin,1.1
JavaScript GUI Lib: ACE Editor bundle plugin,1.1
Javadoc Plugin,1.4
Amazon Web Services SDK,1.11.403
Mailer Plugin,1.22
Parameterized Trigger plugin,2.35.2
GitHub API Plugin,1.92
Pipeline implementation for Blue Ocean,1.9.0
Pipeline: Stage Step,2.3
Ant Plugin,1.9
JavaScript GUI Lib: jQuery bundles (jQuery and jQuery UI) plugin,1.2.1
Pipeline: Multibranch,2.20
cross-platform shell plugin,0.10
Personalization for Blue Ocean,1.9.0
S3 publisher plugin,0.11.2
Branch API Plugin,2.0.21
GitHub Organization Folder Plugin,1.6
Pipeline: Step API,2.16
Pipeline: Supporting APIs,2.22
View Job Filters,2.1.1
Gradle Plugin,1.29
Handy Uri Templates 2.x API Plugin,2.1.6-1.0 

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 15.43.0116.11.2018
til jenkinsc...@googlegroups.com

I get this a bunch:

ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
java.util.concurrent.CancellationException
	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:883)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748) 

Re-launching the node almost always fixes it, but the plugin should be doing that automatically. Instead, I get nodes that fail to launch and then engineers complaining that CI is broken. I have to manually relaunch nodes dozens of times a day. Would be nice if the SSH Slaves Plugin retry settings worked.

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 15.44.0316.11.2018
til jenkinsc...@googlegroups.com

And this, on Windows nodes (usually 2-3 relaunches makes it work):

[11/16/18 20:42:35] [SSH] Checking java version of java
[11/16/18 20:42:51] [SSH] java -version returned 1.8.0_171.
[11/16/18 20:42:51] [SSH] Starting sftp client.
[11/16/18 20:42:52] [SSH] Copying latest slave.jar...
[11/16/18 20:42:53] [SSH] Copied 776,717 bytes.
Expanded the channel window size to 4MB
[11/16/18 20:42:53] [SSH] Starting slave process: cd "c:\Jenkins" && java -Xmx8192m -jar slave.jar
ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
java.util.concurrent.CancellationException
	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:883)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.io.IOException: java.io.InterruptedIOException
	at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:1120)
	at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:148)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:845)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:820)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.InterruptedIOException
	at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:938)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
	at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:409)
	at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:356)
	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:431)
	at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:1110)
	... 7 more

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 15.45.0116.11.2018
til jenkinsc...@googlegroups.com

And this:

[11/16/18 20:42:53] [SSH] Opening SSH connection to 10.0.2.84:22.
ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins

kuisathaverat@gmail.com (JIRA)

ulæst,
16. nov. 2018, 16.58.0316.11.2018
til jenkinsc...@googlegroups.com

Did you apply the workaround on https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall ? there is a thread block with a high number of agents launched with Cloud plugins on 1.27+ (https://issues.jenkins-ci.org/browse/JENKINS-49235), the next version mitigate it a little but requires a change on the credentials plugin, so for the moment disabling the credentials track is the only solution.
Do you see the issue when you spin 10+/20+/30+/50+/100+/... agents?
When the issue happens can you get a threaddump? https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

john@keyba.se (JIRA)

ulæst,
16. nov. 2018, 17.09.0216.11.2018
til jenkinsc...@googlegroups.com

Did you want a master thread dump? For obvious reasons, it'll be difficult to get an agent dump.

Re: the issue, it happens quite sporadically–I haven't noticed a correlation between the number of agents and the probability of the issue. Frankly it seems to happen almost every time a node initially attempts to start up. I changed the retry settings to keep trying again, but those seem to be ignored.

kuisathaverat@gmail.com (JIRA)

ulæst,
16. nov. 2018, 17.29.0216.11.2018
til jenkinsc...@googlegroups.com

>Did you want a master thread dump?

yep, a master thread dump

>I changed the retry settings to keep trying again, but those seem to be ignored.

I do not talk about reties, I talk about disable credentials tracking, so set this property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` in Jenkins start options, it is available on 1.27+

john@keyba.se (JIRA)

ulæst,
19. nov. 2018, 19.43.0219.11.2018
til jenkinsc...@googlegroups.com

I'm trying to test this but breaking changes keep stacking up for me: https://issues.jenkins-ci.org/browse/JENKINS-54686. I've disabled credentials tracking but I'll need to manually load 1.28.1 or wait for a version of SSH Slaves that has trilead-ssh2 restored.

glavoie@gmail.com (JIRA)

ulæst,
21. nov. 2018, 09.15.0321.11.2018
til jenkinsc...@googlegroups.com

The connection timeout error seen here has been a recurrent issue with the ec2-fleet plugin for us: https://issues.jenkins-ci.org/browse/JENKINS-53468?focusedCommentId=354028&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-354028

Some debugging details are explained here: https://github.com/jenkinsci/ec2-fleet-plugin/issues/41

I tracked this down to the Connection.connect() method of trilead that doesn't clear up correctly the timeout handler, when the `kex` timeout is enabled and an exception occurs during the connection attempt. 

Created a PR about this: https://github.com/jenkinsci/trilead-ssh2/pull/36

glavoie@gmail.com (JIRA)

ulæst,
21. nov. 2018, 09.16.0421.11.2018
til jenkinsc...@googlegroups.com
Gabriel Lavoie edited a comment on Bug JENKINS-53468
The connection timeout error seen here has been a recurrent issue with the ec2-fleet plugin for us: https://issues.jenkins-ci.org/browse/JENKINS-53468?focusedCommentId=354028&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-354028

Some debugging details are explained here: [https://github.com/jenkinsci/ec2-fleet-plugin/issues/41]

I tracked this down to the Connection.connect() method of _trilead_ that doesn't clear up correctly the timeout handler, when the `kex` timeout is enabled and an exception occurs during the connection attempt. 

Created a PR about this: 
[ https://github.com/jenkinsci/trilead-ssh2/pull/36 ]

Should I create a separate ticket for this?

john@keyba.se (JIRA)

ulæst,
21. nov. 2018, 16.27.0221.11.2018
til jenkinsc...@googlegroups.com

Gabriel Lavoie it seems that they intend on switching from `trilead-ssh2` to `trilead-api`–will you need to implement this change there too?

glavoie@gmail.com (JIRA)

ulæst,
21. nov. 2018, 16.34.0321.11.2018
til jenkinsc...@googlegroups.com

I wouldn't think so, if I look at https://github.com/jenkinsci/trilead-api-plugin/blob/93ea25242fea336bd46f0356c7a1c5d61fe34f2b/pom.xml#L72

Should I understand that the switch from `trilead-ssh2` to `trilead-api` would be to make that an optional dependency of Jenkins?

`ssh-slave-plugin` already depends on `trilead-api`. I had to bump the version in Jenkins core and rebuild `jenkins.war` to test the fix.

glavoie@gmail.com (JIRA)

ulæst,
21. nov. 2018, 16.34.0321.11.2018
til jenkinsc...@googlegroups.com
Gabriel Lavoie edited a comment on Bug JENKINS-53468
I wouldn't think so, if I look at [https://github.com/jenkinsci/trilead-api-plugin/blob/93ea25242fea336bd46f0356c7a1c5d61fe34f2b/pom.xml#L72]

`trilead-api` still depends on `trilead-ssh2`.

Should I understand that the switch from `trilead-ssh2` to `trilead-api` would be to make that an optional dependency of Jenkins?

`ssh-slave-plugin` already depends on `trilead-api`. I had to bump the version in Jenkins core and rebuild `jenkins.war` to test the fix.

kuisathaverat@gmail.com (JIRA)

ulæst,
22. nov. 2018, 04.00.0222.11.2018
til jenkinsc...@googlegroups.com

yep, the move from trilead-ssh2 to trilead-api is to be able to use a different trilead-ssh2 library than code. The reason to make it is to do not have to upgrade the core to change only a library, this change is similar a change made with the bouncycastle library. My mistake was not testing this change with the PCT and I have screwed up every plugin with the ssh-slaves dependency for a day and a half. Right now (1.29.1), ssh-slaves has the trilead-ssh2 dependency but still uses the trilead-ssh2 because I removed the option to mask core classes, so the next step it is to find every plugin that depends on it and make a PR to be able to make this change in next months, it's gonna be more longer than I like.

 

I also working on a branch 2.0.0 that introduces the concept of SSHProvider, it allows using different libraries or methods to make the SSH connection. The first alternative method will be using the native SSH client, it would improve performance and stability. Another important new feature is one to have several commands profiles, right now, the commands used are hardcoded and are only compatible with agents that have bash installed (even do it works on PowerShell and others), it makes that you can introduce a compatibility issue by modified a simple command, and stops to make improvements in others. By using the commands profiles you will select your command profile for your agent bash, Python, PowerShell, cmd, ... whatever you want.

 

glavoie@gmail.com (JIRA)

ulæst,
22. nov. 2018, 08.04.0222.11.2018
til jenkinsc...@googlegroups.com

Ivan Fernandez Calvo thanks for the explanation. This said, how much time can I expect before my PR is reviewed and possibly integrated in a new version of the ssh-slaves plugin?

kuisathaverat@gmail.com (JIRA)

ulæst,
22. nov. 2018, 08.22.0222.11.2018
til jenkinsc...@googlegroups.com

review and merged in the trilead-ssh probably this week (it depends on my spare time), integrated on trilead-api can be also this week, for the ssh-slaves will not be time soon because as I said it depends on the trilead-ssh core plugin, but if you are testing stuff I can release a beta version with the dependency of the trilead-api. It probably requires to patch some plugins to do not break the thing.

glavoie@gmail.com (JIRA)

ulæst,
22. nov. 2018, 08.37.0222.11.2018
til jenkinsc...@googlegroups.com

I can wait until an official release. Thank you for the heads up!

john@keyba.se (JIRA)

ulæst,
7. dec. 2018, 14.50.0307.12.2018
til jenkinsc...@googlegroups.com

OK I upgraded to 1.29.1 since the trilead-api fixes, and I've applied the credentials workaround. This issue is still happening, where nodes randomly disconnect during a build. I've run a thread dump but it isn't helpful since it only shows what is running as of the newest connection to the agent machine.

java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Also:   <cycle to java.io.IOException: Unexpected termination of the channel>
	Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on i-0f47993940c6bd3da failed. The channel is closing down or has closed down

Thread dump for that instance: jenkins_thread_dump.txt

john@keyba.se (JIRA)

ulæst,
7. dec. 2018, 14.50.0307.12.2018
til jenkinsc...@googlegroups.com
John Zila updated an issue
 
Jenkins / Bug JENKINS-53468
Change By: John Zila
Attachment: jenkins_thread_dump.txt

cquchen@163.com (JIRA)

ulæst,
18. dec. 2018, 00.43.0318.12.2018
til jenkinsc...@googlegroups.com
Jin Chen commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

Maybe the same issue for my case:

jenkins: 2.138.1
ssh-slave: 1.28/1.29.1
Openstack Cloud: 2.40

we use openstack-cloud plugin to start agent dynamically which connected by SSH, and sometimes the started "worker-0" is offline when running builds, but the node is there and we can relanuch the node to bring it online, hope the logs help you to trace the issue

// code placeholder

Error when executing always post condition:Error when executing always post condition: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on worker-0 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:948) at hudson.FilePath.act(FilePath.java:1071) at hudson.FilePath.act(FilePath.java:1060) at hudson.FilePath.mkdirs(FilePath.java:1245) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

cquchen@163.com (JIRA)

ulæst,
18. dec. 2018, 00.43.0618.12.2018
til jenkinsc...@googlegroups.com
Jin Chen edited a comment on Bug JENKINS-53468
Maybe the same issue for my case:

jenkins: 2.138.1
ssh-slave: 1.28/1.29.1
Openstack Cloud: 2.40

we use openstack-cloud plugin to start agent dynamically which connected by SSH, and sometimes the started "worker-0" is offline when running builds, but the node is there and we can relanuch the node to bring it online, hope the logs help you to trace the issue
{code:java}
// code placeholder
{code}
Error when executing always post condition:Error when executing always post condition: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on worker-0 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:948) at hudson.FilePath.act(FilePath.java:1071) at hudson.FilePath.act(FilePath.java:1060) at hudson.FilePath.mkdirs(FilePath.java:1245) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

{code}
 

john@keyba.se (JIRA)

ulæst,
4. jan. 2019, 19.57.0204.01.2019
til jenkinsc...@googlegroups.com

Any news here? I'm having to continue holding back my ssh-slaves plugin.

glavoie@gmail.com (JIRA)

ulæst,
3. maj 2019, 11.00.0303.05.2019
til jenkinsc...@googlegroups.com

Ivan Fernandez Calvo any update about releasing a new version of the ssh-slaves plug with this the latest trilead-ssh?

glavoie@gmail.com (JIRA)

ulæst,
3. maj 2019, 11.00.0803.05.2019
til jenkinsc...@googlegroups.com
Gabriel Lavoie edited a comment on Bug JENKINS-53468
[~ifernandezcalvo] any update about releasing a new version of the ssh-slaves plug plugin with this the latest trilead-ssh?

kuisathaverat@gmail.com (JIRA)

ulæst,
21. jul. 2019, 08.51.3621.07.2019
til jenkinsc...@googlegroups.com

the original issue was resolved, see my comment https://issues.jenkins-ci.org/browse/JENKINS-53468?focusedCommentId=354417&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-354417 , any other issues are new and should need a new Jira to trace them, it is not possible to track several issues in the same Jira finally is a mess.

About the trilead-ssh, finally we rid it from core in 2.186 so now it is managed by the trilead-api-plugin, it would make easy updates of the library, roght now it uses the latest version of trileas-ssh2, it requires core 2.186 and ssh-slaves 1.30.1.

kuisathaverat@gmail.com (JIRA)

ulæst,
21. jul. 2019, 08.52.0221.07.2019
til jenkinsc...@googlegroups.com

glavoie@gmail.com (JIRA)

ulæst,
21. jul. 2019, 08.56.0321.07.2019
til jenkinsc...@googlegroups.com

kuisathaverat@gmail.com (JIRA)

ulæst,
21. jul. 2019, 09.01.0421.07.2019
til jenkinsc...@googlegroups.com
Ivan Fernandez Calvo closed an issue as Fixed
 
Change By: Ivan Fernandez Calvo
Status: In Progress Closed
Resolution: Fixed

gopichand024@gmail.com (JIRA)

ulæst,
29. aug. 2019, 07.26.0229.08.2019
til jenkinsc...@googlegroups.com
chand naidu commented on Bug JENKINS-53468
 
Re: Since 1.27 update Channel is closing or has closed down

Hi Ivan Fernandez Calvo

 

the issue is seen after the upgrades as well! Any suggestions please? 

 

thanks 

gopichand024@gmail.com (JIRA)

ulæst,
2. sep. 2019, 06.09.0302.09.2019
til jenkinsc...@googlegroups.com
Svar alle
Svar til forfatter
Videresend
0 nye opslag