[JIRA] [core] (JENKINS-32950) Jenkins slave resets connection during or just after artifacts download.

26 views
Skip to first unread message

321Kami@gmail.com (JIRA)

unread,
Feb 15, 2016, 8:40:01 AM2/15/16
to jenkinsc...@googlegroups.com
Kamil Bednarczyk created an issue
 
Jenkins / Bug JENKINS-32950
Jenkins slave resets connection during or just after artifacts download.
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 15/Feb/16 1:39 PM
Environment: Windows 2008 R2 64bit (master) + Virtual Machine Windows 2008 R2 64bit (slave)
Labels: slave-agent slave crash windows
Priority: Blocker Blocker
Reporter: Kamil Bednarczyk

In jenkins I have several build jobs with some artifact dependencies. First project builds just fine both on linux and windows, but the second one (requiring artifacts from previous build) fails during artifact download.

Slave log from slave perspective:

Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: Windows2008R2_64bit
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://10.102.22.50:8080/]
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 10.102.22.50:50226
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Feb 15, 2016 5:13:54 AM hudson.remoting.SynchronousCommandTransport$ReaderThread
 run
SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStr
eam.java:90)
        at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:46)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.
java:97)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTrans
port.java:39)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(Abs
tractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(Synchron
ousCommandTransport.java:48)

Feb 15, 2016 5:13:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://10.102.22.50:8080/]
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 10.102.22.50:50226
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

Slave log from master perspective:

JNLP agent connected from /10.102.22.50
<===[JENKINS REMOTING CAPACITY]===>   Slave.jar version: 2.53.2
This is a Windows slave
Slave successfully connected and online
ERROR: Connection terminated
[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=[0mjava.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@2ce66ffa[name=Windows2008R2_64bit]
	at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
	at sun.nio.ch.SocketDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(Unknown Source)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at sun.nio.ch.IOUtil.read(Unknown Source)
	at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
	at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
	at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
	... 6 more

Log from jenkins job:

Building remotely on Windows2008R2_64bit (Win64e) in workspace C:\jenkins\workspace\##\buildNode\Win64e
 > C:\Program Files\Git\bin\git.exe rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > C:\Program Files\Git\bin\git.exe config remote.origin.url #### # timeout=10
Fetching upstream changes from ######
 > C:\Program Files\Git\bin\git.exe --version # timeout=10
using GIT_SSH to set credentials 
 > C:\Program Files\Git\bin\git.exe -c core.askpass=true fetch --tags --progress ssh://####### +refs/heads/*:refs/remotes/origin/*
Checking out Revision ##(refs/remotes/origin/master)
 > C:\Program Files\Git\bin\git.exe config core.sparsecheckout # timeout=10
 > C:\Program Files\Git\bin\git.exe checkout -f ##
 > C:\Program Files\Git\bin\git.exe rev-list ### timeout=10
Run condition [Execution node ] enabling prebuild for step [Execute shell]
Run condition [Execution node ] enabling prebuild for step [Execute Windows batch command]
Slave went offline during the build
ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@41241c12[name=Windows2008R2_64bit]
	at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
	at sun.nio.ch.SocketDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(Unknown Source)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at sun.nio.ch.IOUtil.read(Unknown Source)
	at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
	at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
	at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
	... 6 more

Build step 'Copy artifacts from another project' marked build as failure
ERROR: Step 'Scan for compiler warnings' failed: no workspace for ##/buildNode=Win64e #57
ERROR: Step 'Archive the artifacts' failed: no workspace for ##/buildNode=Win64e #57
Finished: FAILURE

You may notice slave reconnects, but the build is frozen and it has to be killed in jenkins UI. It hapens 19/20 cases (from very rare time to time it finishes without problems).
The problem happens only on Windows slave. It's not happening on any of linux slaves.
I tried:

  • Different java versions and bittness (1.7 32 bit java, 1.8 64 bit java) on slave machine.
  • Setting hudson.diyChunking to false
  • Increasing Xmx, Xms java values
    Nothing helped. Is there any possibility to debug the slave? If I knew what's going on there... logs are not helpful at all.
    One clue is that the jenkins itself was upgraded from 1.3xx to recent build 1.647 (it's not the clean installation).
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

321Kami@gmail.com (JIRA)

unread,
Feb 15, 2016, 8:40:01 AM2/15/16
to jenkinsc...@googlegroups.com
Kamil Bednarczyk updated an issue
Change By: Kamil Bednarczyk
In jenkins I have several build jobs with some artifact dependencies. First project builds just fine both on linux and windows, but the second one (requiring artifacts from previous  build  project ) fails during artifact download.


Slave log from slave perspective:
{code}
{code}


Slave log from master perspective:
{code}

JNLP agent connected from /10.102.22.50
<===[JENKINS REMOTING CAPACITY]===>   Slave.jar version: 2.53.2
This is a Windows slave
Slave successfully connected and online
ERROR: Connection terminated
[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=[0mjava.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@2ce66ffa[name=Windows2008R2_64bit]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 6 more
{code}

Log from jenkins job:
{code}
{code}

You may notice slave reconnects, but the build is frozen and it has to be killed in jenkins UI. It hapens 19/20 cases (from very rare time to time it finishes without problems). 
The problem happens only on Windows slave. It's not happening on any of linux slaves.
I tried:
* Different java versions and bittness (1.7 32 bit java, 1.8 64 bit java) on slave machine.
* Setting hudson.diyChunking to false
* Increasing Xmx, Xms java values

Nothing helped. Is there any possibility to debug the slave? If I knew what's going on there... logs are not helpful at all.
One clue is that the jenkins itself was upgraded from 1.3xx to recent build 1.647 (it's not the clean installation).

321Kami@gmail.com (JIRA)

unread,
Feb 15, 2016, 9:14:02 AM2/15/16
to jenkinsc...@googlegroups.com
Kamil Bednarczyk updated an issue
Change By: Kamil Bednarczyk
Priority: Blocker Major
In jenkins I have several build jobs with some artifact dependencies. First project builds just fine both on linux and windows, but the second one (requiring artifacts from previous project) fails during artifact download.


Checked on different Windows machine (Windows 2012) everything seems to work just fine. Some Hyper-V issue? I'll make more tests.

mlandman99@gmail.com (JIRA)

unread,
May 5, 2016, 2:26:01 PM5/5/16
to jenkinsc...@googlegroups.com
boris ivan commented on Bug JENKINS-32950
 
Re: Jenkins slave resets connection during or just after artifacts download.

I have seen the same problem for years. I typically see this at the end of a build when doing a maven site site:deploy. My guess is that something about the rapid transfer of small files to build the resultant website is preventing some sort of keepalive from working, or tripping some kind of integrity check.

Same stack trace on windows slave, etc. Always:

May 04, 2016 10:17:36 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run


SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)

...
...

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 13, 2017, 8:53:02 PM3/13/17
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
 
Change By: Oleg Nenashev
Component/s: remoting
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages