Hi, I've seen this problem for the last 5-6 releases. When I saw that there were a few bugs fixed in 2.09 around slaves, I was hoping this was fixed. But no such luck. Typically, any time I reboot Jenkins (server), some of the nodes (also windows machines, loaded via command line w/ slave.jar) fail to come back online. They are all online prior to the server reboot. Killing them and reissuing the command line works... sometimes. Sometimes I need to do this 5-6 times and fiddle with the "mark node offline' button for this to eventually work. To ensure I was running w/ latest and greatest for this last go around, I:
- Upgraded server to 2.10.
- Rebooted server
- Went to each node, killed the slave process, deleted existing slave.jar, and replaced with the new one that comes with 2.10.
- reissued the command line for the slave process (shown below)
- This seemed to work
- Next, I went to Jenkins server and upgraded a few plugins that needed upgrading, since I had just recently upgraded to 2.10
- On plugin download screen, I also checked "reboot Jenkins if no jobs are pending" (or whatever it says there).
- Jenkins rebooted
- As usual, some of the nodes don't come back online. When I kill the process and reissue the command, it will work.
For one of the ones in this state (not reconnected after rebooting server), this is what I find: On the "manage node" -> "Log" screen: JNLP agent connected from /<my IP address> (that's it, no other messages). On the command line on that machine, shown below. The beginning of the output is from when I last launched the slave process, which was done after I upgraded to 2.10, (and downloaded the latest slave.jar from the 2.10 server). PS C:\jenkins> java -jar slave.jar -jnlpUrl http://<my IP and node name>/slave-agent.jnlp -secret <my secret> Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: <my node name> Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among http://<my IP>/ Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to <my server IP>:52465 Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected <--- things were good here. Now I try to reboot server after installing plugins. Jun 22, 2016 2:07:33 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel channel <--- I assume this is where jenkins server rebooted after I told it to, after loading plugins, as described above. java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:114) at javax.crypto.CipherInputStream.read(CipherInputStream.java:192) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTranspor t.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Jun 22, 2016 2:07:33 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jun 22, 2016 2:07:44 PM hudson.remoting.Engine waitForServerToBack INFO: Master isn't ready to talk to us. Will retry again: response code=503 Jun 22, 2016 2:07:54 PM hudson.remoting.Engine waitForServerToBack INFO: Master isn't ready to talk to us. Will retry again: response code=503 Jun 22, 2016 2:08:04 PM hudson.remoting.Engine waitForServerToBack INFO: Master isn't ready to talk to us. Will retry again: response code=503 Jun 22, 2016 2:08:19 PM hudson.remoting.Engine waitForServerToBack INFO: Failed to connect to the master. Will retry again java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at java.net.HttpURLConnection.getResponseCode(Unknown Source) at hudson.remoting.Engine.waitForServerToBack(Engine.java:434) at hudson.remoting.Engine.run(Engine.java:325) Jun 22, 2016 2:08:34 PM hudson.remoting.Engine waitForServerToBack INFO: Failed to connect to the master. Will retry again java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at java.net.HttpURLConnection.getResponseCode(Unknown Source) at hudson.remoting.Engine.waitForServerToBack(Engine.java:434) at hudson.remoting.Engine.run(Engine.java:325) Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among http://<my server IP>/ Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to <my server IP>:52672 Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect It will stay here forever. Now I'll try and ctrl-c this, and try again: PS C:\jenkins> java -jar slave.jar -jnlpUrl http://<my IP and node name>/slave-agent.jnlp -secret <my secret> Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave:<my node name> Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among http://<my server IP>/ Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to <my server IP>:52672 Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect Jun 22, 2016 2:37:11 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected ... all is good. But until I ctrl-C and reissue, it was stuck. This happens 50% of the time, and on different nodes each time. When I did the ctrl-C as shown immediately above, the "manage nodes" -> "log" screen adds the following lines, which seem to imply it was connected all along (but it wasn't). <===[JENKINS REMOTING CAPACITY]===>ERROR: Connection terminated java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35) at sun.nio.ch.ChannelInputStream.read(Unknown Source) at sun.nio.ch.ChannelInputStream.read(Unknown Source) at sun.nio.ch.ChannelInputStream.read(Unknown Source) at java.io.InputStream.read(Unknown Source) at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:114) at javax.crypto.CipherInputStream.read(CipherInputStream.java:192) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) JNLP agent connected from /<my node IP> <===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.60 This is a Windows agent Agent successfully connected and online |