[JIRA] [core] (JENKINS-22932) Jenkins slave cannot reconnect to Master once it has been disconnected unless Jenkins is restarted

79 views
Skip to first unread message

hang.dong@gmail.com (JIRA)

unread,
Sep 11, 2015, 3:14:03 PM9/11/15
to jenkinsc...@googlegroups.com
Hang Dong commented on Bug JENKINS-22932
 
Re: Jenkins slave cannot reconnect to Master once it has been disconnected unless Jenkins is restarted

seeing this on windows master with 1.620, when adding new node, we typically connect via jnlp link, then install as service. We hit the issue onthe service client re-connect. Perhaps this helps: due to https secured master, the first service connect won't have valid cert info (and we suspect this triggers the issue master side), we update xml with certificate info then stop/restart the service, but at this stage the master is already in a bad state (not only the new slave cannot reconnect), the master actually loses connection to all other slaves as well. Our workaround so far is restarting master...

10:17:07 java.io.IOException: remote file operation failed: C:\JSBuilds\workspace****************** at hudson.remoting.Channel@1530a3e:********: hudson.remoting.ChannelClosedException: channel is already closed
10:17:07 at hudson.FilePath.act(FilePath.java:987)
10:17:07 at hudson.FilePath.act(FilePath.java:969)
10:17:07 at hudson.FilePath.mkdirs(FilePath.java:1152)
10:17:07 at hudson.model.AbstractProject.checkout(AbstractProject.java:1275)
10:17:07 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610)
10:17:07 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
10:17:07 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532)
10:17:07 at hudson.model.Run.execute(Run.java:1741)
10:17:07 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
10:17:07 at hudson.model.ResourceController.execute(ResourceController.java:98)
10:17:07 at hudson.model.Executor.run(Executor.java:381)
10:17:07 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
10:17:07 at hudson.remoting.Channel.send(Channel.java:550)
10:17:07 at hudson.remoting.Request.call(Request.java:129)
10:17:07 at hudson.remoting.Channel.call(Channel.java:752)
10:17:07 at hudson.FilePath.act(FilePath.java:980)
10:17:07 ... 10 more
10:17:07 Caused by: java.io.IOException
10:17:07 at hudson.remoting.Channel.close(Channel.java:1110)
10:17:07 at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
10:17:07 at hudson.remoting.PingThread.ping(PingThread.java:126)
10:17:07 at hudson.remoting.PingThread.run(PingThread.java:85)
10:17:07 Caused by: java.util.concurrent.TimeoutException: Ping started at 1441990735275 hasn't completed by 1441990975286

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

sjpatel4@buffalo.edu (JIRA)

unread,
Sep 14, 2015, 2:29:03 PM9/14/15
to jenkinsc...@googlegroups.com

Encounter this issue after upgrading jenkins version to 1.622. I am getting following error while connecting to windows slave. I am using "launch slave agents via Java Web Start" option to launch slave. It used to work fine in previous version of 1.597. It seems to be re-introduced, please follow up with suggested fix.

java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@7029f3e3[name=windows_02]
	at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
	at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
	at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)

jglick@cloudbees.com (JIRA)

unread,
Oct 5, 2015, 1:25:09 PM10/5/15
to jenkinsc...@googlegroups.com

brian.b.long@gmail.com (JIRA)

unread,
Nov 25, 2015, 4:12:04 PM11/25/15
to jenkinsc...@googlegroups.com
Brian L commented on Bug JENKINS-22932
 
Re: Jenkins slave cannot reconnect to Master once it has been disconnected unless Jenkins is restarted

This is affecting me as well.

Master: Jenkins ver. 1.638, Ubuntu 14.04.3 LTS, running JRE 1.8.0_65-b17
Slave: Windows Server 2008, connected via JNLP :

{{Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\Administrator>java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)}}

Do we have a workaround? I wonder if adding some Job configuration to programmatically kill the process running java ... -jar "...\slave.jar" might work?

brian.b.long@gmail.com (JIRA)

unread,
Nov 25, 2015, 4:15:02 PM11/25/15
to jenkinsc...@googlegroups.com
Brian L edited a comment on Bug JENKINS-22932
This is affecting me as well.  

Master: Jenkins ver. 1.638,  Ubuntu 14.04.3 LTS, running JRE 1.8.0_65-b17 
Slave: Windows Server 2008, connected via JNLP : 

{ { code}

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

    
C:\Users\Administrator>java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)



{code
} }

Do we have a workaround?  I wonder if adding some Job configuration to programmatically kill the process running {{java ... -jar "...\slave.jar"}} might work?

brian.b.long@gmail.com (JIRA)

unread,
Nov 25, 2015, 5:33:03 PM11/25/15
to jenkinsc...@googlegroups.com
Brian L commented on Bug JENKINS-22932

I didn't have much luck with an actual patch, but in the meantime, here's the workaround I'm attempting to implement:

1. Install the Groovy plugin
2. Use this code as it's own Job :

import jenkins.model.*

println "The system is now going down for restart."
println "Once the bug 'https://issues.jenkins-ci.org/browse/JENKINS-22932' is resolved, this job should be removed."
  
Jenkins.instance.doSafeRestart(null);

3. Have the job triggered after any of your Windows slaves finish doing work

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 5:02:13 PM3/27/17
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
 
Change By: Oleg Nenashev
Component/s: remoting
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 13, 2018, 10:33:33 PM3/13/18
to jenkinsc...@googlegroups.com
Oleg Nenashev assigned an issue to Unassigned
 

Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

Change By: Oleg Nenashev
Assignee: Oleg Nenashev
Reply all
Reply to author
Forward
0 new messages