[JIRA] (JENKINS-54679) SSHLauncher doesn't continue retrying to connect to remote executor

7 views
Skip to first unread message

xistence@0x58.com (JIRA)

unread,
Nov 16, 2018, 3:28:03 PM11/16/18
to jenkinsc...@googlegroups.com
Bert JW Regeer created an issue
 
Jenkins / Bug JENKINS-54679
SSHLauncher doesn't continue retrying to connect to remote executor
Issue Type: Bug Bug
Assignee: Chad Schmutzer
Components: ec2-fleet-plugin, ssh-agent-plugin
Created: 2018-11-16 20:27
Labels: plugin regression
Priority: Critical Critical
Reporter: Bert JW Regeer

SSHLauncher{host='10.50.10.252', port=22, credentialsId='aaf2ee5e-32bd-4675-9793-0570922f9c66', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=5, maxNumRetries=120, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[11/16/18 20:19:40] [SSH] Opening SSH connection to 10.50.10.252:22.
Connection refused (Connection refused)
SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 120 more retries left.
Connection refused (Connection refused)
SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 119 more retries left.
Connection refused (Connection refused)
SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 118 more retries left.
ERROR: null
java.util.concurrent.CancellationException
{{ at java.util.concurrent.FutureTask.report(FutureTask.java:121)}}
{{ at java.util.concurrent.FutureTask.get(FutureTask.java:192)}}
{{ at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:904)}}
{{ at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)}}
{{ at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)}}
{{ at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
[11/16/18 20:19:45] Launch failed - cleaning up connection
[11/16/18 20:19:45] [SSH] Connection closed.

 

This happens whenever a new ec2 fleet instance is brought online. During this time cloud-init is still working it's magic to install docker/openjdk and add the new Jenkins user (and it's key). However after the Launch failed error message there are no more retries and that slave is never contacted again, even-though if we manually press the button to reconnect the slave comes online without issues.

 

Clearly there are more retries left, yet it is completely dead in the water.

This used to work without issues on older versions of Jenkins and this just recently started.

 

We are running Jenkins ver. 2.138.3 from the jenkinsci/blueocean docker image.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

evan@borgstrom.ca (JIRA)

unread,
Dec 13, 2018, 2:30:02 PM12/13/18
to jenkinsc...@googlegroups.com
Evan Borgstrom commented on Bug JENKINS-54679
 
Re: SSHLauncher doesn't continue retrying to connect to remote executor

We are also facing this same issue.

SSHLauncher{host='10.200.130.209', port=22, credentialsId='slave-ssh', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=3, retryWaitTime=60, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/13/18 19:14:12] [SSH] Opening SSH connection to 10.200.130.209:22.
ERROR: null
java.util.concurrent.CancellationException
	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[12/13/18 19:15:12] Launch failed - cleaning up connection
[12/13/18 19:15:12] [SSH] Connection closed.

We recently upgraded from jenkins 2.138.2 to 2.150.1, and from ssh-slaves 1.28.1 to 1.29.1. I'm going to rollback to 1.28.1 again and see if it solves our issue.

Like Bert JW Regeer this used to work without issue and if we go and click the "Launch Agent" button manually the host connects without issue.

evan@borgstrom.ca (JIRA)

unread,
Dec 20, 2018, 1:30:02 PM12/20/18
to jenkinsc...@googlegroups.com

FWIW, we found out that this is a race that we're losing.

We are still on 1.29.1 of ssh-slaves, but we changed our retryWaitTime from 60 to 120 seconds and we don't run into this issue anymore.

artem.stasuk@gmail.com (JIRA)

unread,
May 14, 2019, 1:48:02 AM5/14/19
to jenkinsc...@googlegroups.com

shrinathdm@gmail.com (JIRA)

unread,
Sep 19, 2019, 3:02:02 AM9/19/19
to jenkinsc...@googlegroups.com

We recently updated Jenkins to 2.194 and ssh slave plugin to 1.30.1.

Post upgrade Jenkins slave agents are failed and unable to connect.

Please find the logs below and let me know the workaround or fix for this issue.

just before slave Senseai gets launched ...just before slave Senseai gets launched ...executing pre-launch scripts ...Connection timed out (Connection timed out)SSH Connection failed with IOException: "Connection timed out (Connection timed out)", retrying in 15 seconds.  There are 10 more retries left.ERROR: nulljava.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:475) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)[09/05/19 04:40:42] Launch failed - cleaning up connection
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

artem.stasuk@gmail.com (JIRA)

unread,
Jan 24, 2020, 11:27:03 AM1/24/20
to jenkinsc...@googlegroups.com

artem.stasuk@gmail.com (JIRA)

unread,
Jan 24, 2020, 11:28:02 AM1/24/20
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages