[JIRA] (JENKINS-13237) Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

SebTardif@ncf.ca (JIRA)

unread,

Mar 26, 2012, 11:19:25 AM3/26/12

to jenkinsc...@googlegroups.com

Sebastien Tardif created JENKINS-13237:
------------------------------------------

Summary: Unsuccessful SSH slave start do not timeout and cannot be retry, even manually
Key: JENKINS-13237
URL: https://issues.jenkins-ci.org/browse/JENKINS-13237
Project: Jenkins
Issue Type: Bug
Components: ssh-slaves
Affects Versions: current
Environment: Windows 2008 to Oracle Linux (Redhat enterprise 5)
Reporter: Sebastien Tardif
Assignee: Kohsuke Kawaguchi

We have like 10 slaves. Too often one of the slave do not start on first attempt using SSH configuration. The attempt never timeout, and we are then stuck. There is no way to "kill" the attempt and retry. We end-up to use the following workaround -> We create a new node, copying exactly the same configuration, then it can connect!

An automatic "retry", adding timeout , and kill support will be useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

albin.joy@huawei.com (JIRA)

unread,

Jul 6, 2012, 2:20:23 AM7/6/12

to jenkinsc...@googlegroups.com

Albin Joy commented on

JENKINS-13237

Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

I am also facing the similar kind of problem, some times slave is hangs.. I am using Jenkins version-1.458. Do we have any updates on this issue.

This message is automatically generated by JIRA.

If you think it was sent incorrectly, please contact your JIRA administrators.

albin.joy@huawei.com (JIRA)

unread,

Jul 6, 2012, 2:20:23 AM7/6/12

to jenkinsc...@googlegroups.com

Albin Joy edited a comment on

JENKINS-13237

Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

I am also facing the similar kind of problem, some times slave hangs.. I am using Jenkins version-1.458. Do we have any updates on this issue.

This message is automatically generated by JIRA.

If you think it was sent incorrectly, please contact your JIRA administrators.

kk@kohsuke.org (JIRA)

unread,

Apr 17, 2013, 7:02:32 PM4/17/13

to jenkinsc...@googlegroups.com

Kohsuke Kawaguchi commented on

JENKINS-13237

Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

"Relaunch slave agent" button does the forced retry.

When you experience this problem, follow the instruction at https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+is+hanging, obtain thread dumps, and please attach them here.

This message is automatically generated by JIRA.

If you think it was sent incorrectly, please contact your JIRA administrators.

guy@rzn.co.il (JIRA)

unread,

Jun 3, 2013, 3:22:58 PM6/3/13

to jenkinsc...@googlegroups.com

Guy Rozendorn commented on

JENKINS-13237

Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

Looks related to JENKINS-13131, did you try waiting for 15 minutes and see if your slaves reconnect?

This message is automatically generated by JIRA.

If you think it was sent incorrectly, please contact your JIRA administrators.

kay.abendroth@epages.com (JIRA)

unread,

Jun 27, 2014, 7:25:05 AM6/27/14

to jenkinsc...@googlegroups.com

Kay Abendroth commented on

JENKINS-13237

Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

I don't think this is related to JENKINS-13131 as the description there indicates the slave connection will be established after 15 to 20 minutes eventually.

We're facing the same issue (as described within here) in our environment:

a test node is configured being connected to via SSH (In demand delay: 3, Idle delay: 5),
a Job loads a virtual machine snapshot and triggers another Job, which triggers the In demand delay,
this leads to the following behavior sometimes:
- On the slave detail page you see the icon blinking and
- the message This node is being launched appears, but
- in reality nothing happens or some kind of endless connection attempt is being made. Logging into the machine via SSH from local workstation works.
- Clicking on the button Relaunch slave agent in Jenkins doesn't help.
- Only workaround for us so far: Restarting Jenkins.

This message is automatically generated by JIRA.

If you think it was sent incorrectly, please contact your JIRA administrators.

kay.abendroth@epages.com (JIRA)

unread,

Jun 27, 2014, 7:48:04 AM6/27/14

to jenkinsc...@googlegroups.com

Kay Abendroth commented on

JENKINS-13237

Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

Log entries for that slave that look unusual:

Jun 26, 2014 11:52:36 PM com.cloudbees.jenkins.support.SupportPlugin$ComputerListenerImpl onOnline
WARNING: Could not install root log handler on node: cd-vm-test-master-01
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
	at hudson.remoting.Request.call(Request.java:174)
	at hudson.remoting.Channel.call(Channel.java:739)
	at com.cloudbees.jenkins.support.SupportPlugin$ComputerListenerImpl.onOnline(SupportPlugin.java:480)
	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:512)
	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:349)
	at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:712)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:498)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:232)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
	at hudson.remoting.Request.abort(Request.java:299)
	at hudson.remoting.Channel.terminate(Channel.java:802)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:951)
	at hudson.remoting.Channel$2.handle(Channel.java:475)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
Caused by: hudson.remoting.Channel$OrderlyShutdown
	... 3 more
Caused by: Command close created at
	at hudson.remoting.Command.<init>(Command.java:56)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:945)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:943)
	at hudson.remoting.Channel.close(Channel.java:1026)
	at hudson.remoting.Channel.close(Channel.java:1009)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:950)
	... 2 more

Jun 26, 2014 8:53:19 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel cd-vm-test-master-01
java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)