[JIRA] (JENKINS-36155) windows remote slaves don't always come back online

29 views
Skip to first unread message

mlandman99@gmail.com (JIRA)

unread,
Jun 22, 2016, 2:43:02 PM6/22/16
to jenkinsc...@googlegroups.com
boris ivan created an issue
 
Jenkins / Bug JENKINS-36155
windows remote slaves don't always come back online
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2016/Jun/22 6:42 PM
Environment: Windows 2012R2 slaves and server
Priority: Critical Critical
Reporter: boris ivan

Hi,

I've seen this problem for the last 5-6 releases. When I saw that there were a few bugs fixed in 2.09 around slaves, I was hoping this was fixed. But no such luck.

Typically, any time I reboot Jenkins (server), some of the nodes (also windows machines, loaded via command line w/ slave.jar) fail to come back online. They are all online prior to the server reboot.

Killing them and reissuing the command line works... sometimes. Sometimes I need to do this 5-6 times and fiddle with the "mark node offline' button for this to eventually work.

To ensure I was running w/ latest and greatest for this last go around, I:

  • Upgraded server to 2.10.
  • Rebooted server
  • Went to each node, killed the slave process, deleted existing slave.jar, and replaced with the new one that comes with 2.10.
  • reissued the command line for the slave process (shown below)
  • This seemed to work
  • Next, I went to Jenkins server and upgraded a few plugins that needed upgrading, since I had just recently upgraded to 2.10
  • On plugin download screen, I also checked "reboot Jenkins if no jobs are pending" (or whatever it says there).
  • Jenkins rebooted
  • As usual, some of the nodes don't come back online. When I kill the process and reissue the command, it will work.

For one of the ones in this state (not reconnected after rebooting server), this is what I find:

On the "manage node" -> "Log" screen:

JNLP agent connected from /<my IP address>

(that's it, no other messages).

On the command line on that machine, shown below. The beginning of the output is from when I last launched the slave process, which was done after I upgraded to 2.10, (and downloaded the latest slave.jar from the 2.10 server).

PS C:\jenkins> java -jar slave.jar -jnlpUrl http://<my IP and node name>/slave-agent.jnlp -secret <my secret>
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: <my node name>
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://<my IP>/
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <my server IP>:52465
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP3-connect
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

<--- things were good here. Now I try to reboot server after installing plugins.

Jun 22, 2016 2:07:33 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel channel

<--- I assume this is where jenkins server rebooted after I told it to, after loading plugins, as described above.

java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:114)
at javax.crypto.CipherInputStream.read(CipherInputStream.java:192)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTranspor
t.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)

Jun 22, 2016 2:07:33 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Jun 22, 2016 2:07:44 PM hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
Jun 22, 2016 2:07:54 PM hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
Jun 22, 2016 2:08:04 PM hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
Jun 22, 2016 2:08:19 PM hudson.remoting.Engine waitForServerToBack
INFO: Failed to connect to the master. Will retry again
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at hudson.remoting.Engine.waitForServerToBack(Engine.java:434)
at hudson.remoting.Engine.run(Engine.java:325)

Jun 22, 2016 2:08:34 PM hudson.remoting.Engine waitForServerToBack
INFO: Failed to connect to the master. Will retry again
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at hudson.remoting.Engine.waitForServerToBack(Engine.java:434)
at hudson.remoting.Engine.run(Engine.java:325)

Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://<my server IP>/
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <my server IP>:52672
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP3-connect

It will stay here forever.

Now I'll try and ctrl-c this, and try again:

PS C:\jenkins> java -jar slave.jar -jnlpUrl http://<my IP and node name>/slave-agent.jnlp -secret <my secret>
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave:<my node name>
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://<my server IP>/
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <my server IP>:52672
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP3-connect
Jun 22, 2016 2:37:11 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

... all is good. But until I ctrl-C and reissue, it was stuck. This happens 50% of the time, and on different nodes each time.

When I did the ctrl-C as shown immediately above, the "manage nodes" -> "log" screen adds the following lines, which seem to imply it was connected all along (but it wasn't).

<===[JENKINS REMOTING CAPACITY]===>ERROR: Connection terminated
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35)
at sun.nio.ch.ChannelInputStream.read(Unknown Source)
at sun.nio.ch.ChannelInputStream.read(Unknown Source)
at sun.nio.ch.ChannelInputStream.read(Unknown Source)
at java.io.InputStream.read(Unknown Source)
at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:114)
at javax.crypto.CipherInputStream.read(CipherInputStream.java:192)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
JNLP agent connected from /<my node IP>
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.60
This is a Windows agent
Agent successfully connected and online

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

o.v.nenashev@gmail.com (JIRA)

unread,
Dec 27, 2016, 10:10:01 AM12/27/16
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Dec 27, 2016, 10:10:01 AM12/27/16
to jenkinsc...@googlegroups.com

koma0277@java.net (JIRA)

unread,
Jan 19, 2017, 4:19:01 AM1/19/17
to jenkinsc...@googlegroups.com

We have this issue nearly every day. Is there no workaround available?

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 28, 2017, 4:36:01 AM2/28/17
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 28, 2017, 4:38:01 AM2/28/17
to jenkinsc...@googlegroups.com
Oleg Nenashev commented on Bug JENKINS-36155
 
Re: windows remote slaves don't always come back online

So, if you still see this issue, the first step would be to disable the JNLP3 protocol. New Jenkins UI offers such option (in global security settings). JNLP3 or JNLP3 are preferable

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 28, 2017, 4:38:01 AM2/28/17
to jenkinsc...@googlegroups.com
Oleg Nenashev edited a comment on Bug JENKINS-36155
So, if you still see this issue, the first step would be to disable the JNLP3 protocol. New Jenkins UI offers such option (in global security settings). JNLP3 JNLP2 or JNLP3 JNLP4 are preferable
Reply all
Reply to author
Forward
0 new messages