[JIRA] (JENKINS-13237) Unsuccessful SSH slave start do not timeout and cannot be retry, even manually

249 views
Skip to first unread message

SebTardif@ncf.ca (JIRA)

unread,
Mar 26, 2012, 11:19:25 AM3/26/12
to jenkinsc...@googlegroups.com
Sebastien Tardif created JENKINS-13237:
------------------------------------------

Summary: Unsuccessful SSH slave start do not timeout and cannot be retry, even manually
Key: JENKINS-13237
URL: https://issues.jenkins-ci.org/browse/JENKINS-13237
Project: Jenkins
Issue Type: Bug
Components: ssh-slaves
Affects Versions: current
Environment: Windows 2008 to Oracle Linux (Redhat enterprise 5)
Reporter: Sebastien Tardif
Assignee: Kohsuke Kawaguchi


We have like 10 slaves. Too often one of the slave do not start on first attempt using SSH configuration. The attempt never timeout, and we are then stuck. There is no way to "kill" the attempt and retry. We end-up to use the following workaround -> We create a new node, copying exactly the same configuration, then it can connect!

An automatic "retry", adding timeout , and kill support will be useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


albin.joy@huawei.com (JIRA)

unread,
Jul 6, 2012, 2:20:23 AM7/6/12
to jenkinsc...@googlegroups.com
Albin Joy commented on Bug JENKINS-13237

I am also facing the similar kind of problem, some times slave is hangs.. I am using Jenkins version-1.458. Do we have any updates on this issue.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.

albin.joy@huawei.com (JIRA)

unread,
Jul 6, 2012, 2:20:23 AM7/6/12
to jenkinsc...@googlegroups.com
 
Albin Joy edited a comment on Bug JENKINS-13237

I am also facing the similar kind of problem, some times slave hangs.. I am using Jenkins version-1.458. Do we have any updates on this issue.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.

kk@kohsuke.org (JIRA)

unread,
Apr 17, 2013, 7:02:32 PM4/17/13
to jenkinsc...@googlegroups.com

"Relaunch slave agent" button does the forced retry.

When you experience this problem, follow the instruction at https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+is+hanging, obtain thread dumps, and please attach them here.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.

guy@rzn.co.il (JIRA)

unread,
Jun 3, 2013, 3:22:58 PM6/3/13
to jenkinsc...@googlegroups.com

Looks related to JENKINS-13131, did you try waiting for 15 minutes and see if your slaves reconnect?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.

kay.abendroth@epages.com (JIRA)

unread,
Jun 27, 2014, 7:25:05 AM6/27/14
to jenkinsc...@googlegroups.com

I don't think this is related to JENKINS-13131 as the description there indicates the slave connection will be established after 15 to 20 minutes eventually.

We're facing the same issue (as described within here) in our environment:

  • a test node is configured being connected to via SSH (In demand delay: 3, Idle delay: 5),
  • a Job loads a virtual machine snapshot and triggers another Job, which triggers the In demand delay,
  • this leads to the following behavior sometimes:
    • On the slave detail page you see the icon blinking and
    • the message This node is being launched appears, but
    • in reality nothing happens or some kind of endless connection attempt is being made. Logging into the machine via SSH from local workstation works.
    • Clicking on the button Relaunch slave agent in Jenkins doesn't help.
    • Only workaround for us so far: Restarting Jenkins.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.

kay.abendroth@epages.com (JIRA)

unread,
Jun 27, 2014, 7:48:04 AM6/27/14
to jenkinsc...@googlegroups.com

Log entries for that slave that look unusual:

Jun 26, 2014 11:52:36 PM com.cloudbees.jenkins.support.SupportPlugin$ComputerListenerImpl onOnline
WARNING: Could not install root log handler on node: cd-vm-test-master-01
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
	at hudson.remoting.Request.call(Request.java:174)
	at hudson.remoting.Channel.call(Channel.java:739)
	at com.cloudbees.jenkins.support.SupportPlugin$ComputerListenerImpl.onOnline(SupportPlugin.java:480)
	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:512)
	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:349)
	at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:712)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:498)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:232)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
	at hudson.remoting.Request.abort(Request.java:299)
	at hudson.remoting.Channel.terminate(Channel.java:802)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:951)
	at hudson.remoting.Channel$2.handle(Channel.java:475)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
Caused by: hudson.remoting.Channel$OrderlyShutdown
	... 3 more
Caused by: Command close created at
	at hudson.remoting.Command.<init>(Command.java:56)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:945)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:943)
	at hudson.remoting.Channel.close(Channel.java:1026)
	at hudson.remoting.Channel.close(Channel.java:1009)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:950)
	... 2 more
Jun 26, 2014 8:53:19 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel cd-vm-test-master-01
java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
Reply all
Reply to author
Forward
0 new messages