[JIRA] (JENKINS-38834) Freestyle jobs hang in 2.19.1 on Windows 10 Nodes

5 views
Skip to first unread message

treadstoneit@gmail.com (JIRA)

unread,
Oct 7, 2016, 4:53:01 PM10/7/16
to jenkinsc...@googlegroups.com
Emory Penney created an issue
 
Jenkins / Bug JENKINS-38834
Freestyle jobs hang in 2.19.1 on Windows 10 Nodes
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2016/Oct/07 8:52 PM
Environment: Jenkins 2.19.1 on Ubuntu 14.04 x64
Windows 10 (1607) Connected via JNLP
Priority: Critical Critical
Reporter: Emory Penney

Freestyle jobs hang either at the beginning or the end of execution. I've let a few run for ~18 hours before they eventually crash with errors like these:

07:52:37 java.io.IOException: Unable to get hostname from slave. null
07:52:37              at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:1187)
07:52:37              at hudson.model.AbstractProject.checkout(AbstractProject.java:1278)
07:52:37              at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
07:52:37              at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
07:52:37              at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
07:52:37              at hudson.model.Run.execute(Run.java:1720)
07:52:37              at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
07:52:37              at hudson.model.ResourceController.execute(ResourceController.java:98)
07:52:37              at hudson.model.Executor.run(Executor.java:404)

Reverting back to Jenkins 2.7.4 immediately resolved the issue. Issue was not observed with any Linux/OSX/older Windows nodes.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

dbeck@cloudbees.com (JIRA)

unread,
Oct 14, 2016, 10:30:13 AM10/14/16
to jenkinsc...@googlegroups.com

treadstoneit@gmail.com (JIRA)

unread,
Oct 17, 2016, 7:16:01 PM10/17/16
to jenkinsc...@googlegroups.com

Will do, I tried today to reproduce it in a non-production environment unsuccessfully. I'm working on deploying a clone of our production environment now, but it could take a few days...

treadstoneit@gmail.com (JIRA)

unread,
Oct 25, 2016, 4:15:08 PM10/25/16
to jenkinsc...@googlegroups.com

Jenkins ThreadDump from one of our hung executors:

Executor #0 for GJ-WTX64-S05V13 : executing jenkins_admin_component_1 #25147 / waiting for hudson.remoting.Channel@45b513d2:GJ-WTX64-S05V13

"Executor #0 for GJ-WTX64-S05V13 : executing jenkins_admin_component_1 #25147 / waiting for hudson.remoting.Channel@45b513d2:GJ-WTX64-S05V13" Id=204 Group=main TIMED_WAITING on hudson.remoting.UserRequest@33f1c9bd
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.UserRequest@33f1c9bd
    at hudson.remoting.Request.call(Request.java:147)
    at hudson.remoting.Channel.call(Channel.java:796)
    at hudson.FilePath.act(FilePath.java:1007)
    at hudson.FilePath.act(FilePath.java:996)
    at hudson.FilePath.deleteRecursive(FilePath.java:1198)
    at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:934)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1278)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
    at hudson.model.Run.execute(Run.java:1720)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:404)

ThreadDump from that server:

Channel reader thread: channel

"Channel reader thread: channel" Id=2301 Group=main RUNNABLE (in native)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)

  • locked java.io.BufferedInputStream@40bda018
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
    at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)

main

"main" Id=1 Group=main WAITING on hudson.remoting.Engine@a3d1956
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.Engine@a3d1956
    at java.lang.Thread.join(Unknown Source)
    at java.lang.Thread.join(Unknown Source)
    at hudson.remoting.jnlp.Main.main(Main.java:150)
    at hudson.remoting.jnlp.Main._main(Main.java:143)
    at hudson.remoting.Launcher.run(Launcher.java:231)
    at hudson.remoting.Launcher.main(Launcher.java:195)

Ping thread for channel hudson.remoting.Channel@662535d4:channel

"Ping thread for channel hudson.remoting.Channel@662535d4:channel" Id=2302 Group=main TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at hudson.remoting.PingThread.run(PingThread.java:91)

pool-1-thread-2253 for channel

"pool-1-thread-2253 for channel" Id=2298 Group=main RUNNABLE
at com.sun.jna.Native.initIDs(Native Method)
at com.sun.jna.Native.<clinit>(Native.java:148)
at hudson.util.jna.Kernel32Utils.load(Kernel32Utils.java:112)
at hudson.util.jna.Kernel32.<clinit>(Kernel32.java:37)
at hudson.util.jna.Kernel32Utils.getWin32FileAttributes(Kernel32Utils.java:77)
at hudson.util.jna.Kernel32Utils.isJunctionOrSymlink(Kernel32Utils.java:98)
at hudson.Util.isSymlink(Util.java:510)
at hudson.FilePath.deleteRecursive(FilePath.java:1221)
at hudson.FilePath.access$1000(FilePath.java:195)
at hudson.FilePath$14.invoke(FilePath.java:1201)
at hudson.FilePath$14.invoke(FilePath.java:1198)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)

Number of locked synchronizers = 1

  • java.util.concurrent.ThreadPoolExecutor$Worker@75d956c4

pool-1-thread-2256 for channel

"pool-1-thread-2256 for channel" Id=2309 Group=main RUNNABLE
at com.sun.jna.Pointer.<clinit>(Pointer.java:41)
at com.sun.jna.Structure.<clinit>(Structure.java:2078)
at org.jvnet.hudson.Windows.monitor(Windows.java:42)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:124)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)

Number of locked synchronizers = 1

  • java.util.concurrent.ThreadPoolExecutor$Worker@756a3221

pool-1-thread-2261 for channel

"pool-1-thread-2261 for channel" Id=2314 Group=main RUNNABLE
at sun.management.ThreadImpl.dumpThreads0(Native Method)
at sun.management.ThreadImpl.dumpAllThreads(Unknown Source)
at hudson.Functions.getThreadInfos(Functions.java:1220)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:98)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:95)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)

Number of locked synchronizers = 1

  • java.util.concurrent.ThreadPoolExecutor$Worker@50474da7

RemoteInvocationHandler 19

"RemoteInvocationHandler 19" Id=2300 Group=main TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@fec0bff
at java.lang.Object.wait(Native Method)

  • waiting on java.lang.ref.ReferenceQueue$Lock@fec0bff
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:564)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
    at java.lang.Thread.run(Unknown Source)

Thread-1

"Thread-1" Id=13 Group=main TIMED_WAITING on hudson.remoting.Channel@662535d4
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.Channel@662535d4
    at hudson.remoting.Channel.join(Channel.java:948)
    at hudson.remoting.Engine.run(Engine.java:316)

Attach Listener

"Attach Listener" Id=5 Group=system RUNNABLE

Finalizer

"Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@6641224e
at java.lang.Object.wait(Native Method)

  • waiting on java.lang.ref.ReferenceQueue$Lock@6641224e
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

Reference Handler

"Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@272dd0cb
at java.lang.Object.wait(Native Method)

  • waiting on java.lang.ref.Reference$Lock@272dd0cb
    at java.lang.Object.wait(Unknown Source)
    at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)

Signal Dispatcher

"Signal Dispatcher" Id=4 Group=system RUNNABLE

I've got 12 systems hung like this, all Windows 10 X64.

smithgcovert@gmail.com (JIRA)

unread,
Oct 26, 2016, 9:44:07 PM10/26/16
to jenkinsc...@googlegroups.com

Your stack seems to be pretty similar to mine, in this bug:

https://issues.jenkins-ci.org/browse/JENKINS-39179

That bug is driving me and my team crazy. They could be the same.

smithgcovert@gmail.com (JIRA)

unread,
Oct 26, 2016, 9:47:01 PM10/26/16
to jenkinsc...@googlegroups.com

Are all of your Windows 10 machines updated to the anniversary edition? For us, it is happening and locking up all builds. If you find the build with the JNI stack, you can kill that single slave, and all other slaves will be able to continue.

treadstoneit@gmail.com (JIRA)

unread,
Oct 27, 2016, 12:05:02 PM10/27/16
to jenkinsc...@googlegroups.com

Yes all running Windows 10 Anniversary Edition. Luckily for us the issue remains isolated to specific nodes, and other jobs seem to run unimpeded. As I said above, downgrading to 2.7.4 made the issue disappear for me!

dbeck@cloudbees.com (JIRA)

unread,
Oct 27, 2016, 4:10:01 PM10/27/16
to jenkinsc...@googlegroups.com

To clarify, all affected nodes run Windows 10 Anniversary Edition, but not all nodes running Windows 10 Anniversary Edition are affected by this issue (despite running the same kinds of jobs)?

treadstoneit@gmail.com (JIRA)

unread,
Oct 27, 2016, 4:38:02 PM10/27/16
to jenkinsc...@googlegroups.com

That appears to be the case, all affected nodes are Win10 AE, but not all of my Win10 AE nodes are affected. It doesn't look like the affliction is consistent: Sometimes jobs will run fine on these nodes, sometimes they won't.

smithgcovert@gmail.com (JIRA)

unread,
Oct 27, 2016, 9:05:01 PM10/27/16
to jenkinsc...@googlegroups.com

I believe the difference between my environment (where this hang causes all builds to stall) and your environment (where only that single node stalls) - is that we are using Pipeline builds, where you are using freestyle builds.

As part of the code for a pipeline build runs on the master, this hang seems to block all other activity on the master. That makes it a bit worse in the scenario where Pipeline is being used – but either way, its very bad.

smithgcovert@gmail.com (JIRA)

unread,
Oct 28, 2016, 12:29:01 AM10/28/16
to jenkinsc...@googlegroups.com

We were able to recreate this problem with per-anniversary edition of Windows 10, so its not related to that.

It looks like this might be related to https://issues.jenkins-ci.org/browse/JENKINS-19445

o.v.nenashev@gmail.com (JIRA)

unread,
Nov 10, 2016, 4:42:01 PM11/10/16
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Nov 10, 2016, 4:43:01 PM11/10/16
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
Change By: Oleg Nenashev
Component/s: perforce-plugin

treadstoneit@gmail.com (JIRA)

unread,
Mar 9, 2017, 4:33:01 PM3/9/17
to jenkinsc...@googlegroups.com
Emory Penney commented on Bug JENKINS-38834
 
Re: Freestyle jobs hang in 2.19.1 on Windows 10 Nodes

I tried 2.32.3 with Oracle JDK8 today, issue persists.

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages