[JIRA] (JENKINS-38834) Freestyle jobs hang in 2.19.1 on Windows 10 Nodes

已查看 5 次
跳至第一个未读帖子

treadstoneit@gmail.com (JIRA)

未读,
2016年10月7日 16:53:012016/10/7
收件人 jenkinsc...@googlegroups.com
Emory Penney created an issue
 
Jenkins / Bug JENKINS-38834
Freestyle jobs hang in 2.19.1 on Windows 10 Nodes
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2016/Oct/07 8:52 PM
Environment: Jenkins 2.19.1 on Ubuntu 14.04 x64
Windows 10 (1607) Connected via JNLP
Priority: Critical Critical
Reporter: Emory Penney

Freestyle jobs hang either at the beginning or the end of execution. I've let a few run for ~18 hours before they eventually crash with errors like these:

07:52:37 java.io.IOException: Unable to get hostname from slave. null
07:52:37              at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:1187)
07:52:37              at hudson.model.AbstractProject.checkout(AbstractProject.java:1278)
07:52:37              at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
07:52:37              at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
07:52:37              at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
07:52:37              at hudson.model.Run.execute(Run.java:1720)
07:52:37              at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
07:52:37              at hudson.model.ResourceController.execute(ResourceController.java:98)
07:52:37              at hudson.model.Executor.run(Executor.java:404)

Reverting back to Jenkins 2.7.4 immediately resolved the issue. Issue was not observed with any Linux/OSX/older Windows nodes.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

dbeck@cloudbees.com (JIRA)

未读,
2016年10月14日 10:30:132016/10/14
收件人 jenkinsc...@googlegroups.com

treadstoneit@gmail.com (JIRA)

未读,
2016年10月17日 19:16:012016/10/17
收件人 jenkinsc...@googlegroups.com

Will do, I tried today to reproduce it in a non-production environment unsuccessfully. I'm working on deploying a clone of our production environment now, but it could take a few days...

treadstoneit@gmail.com (JIRA)

未读,
2016年10月25日 16:15:082016/10/25
收件人 jenkinsc...@googlegroups.com

Jenkins ThreadDump from one of our hung executors:

Executor #0 for GJ-WTX64-S05V13 : executing jenkins_admin_component_1 #25147 / waiting for hudson.remoting.Channel@45b513d2:GJ-WTX64-S05V13

"Executor #0 for GJ-WTX64-S05V13 : executing jenkins_admin_component_1 #25147 / waiting for hudson.remoting.Channel@45b513d2:GJ-WTX64-S05V13" Id=204 Group=main TIMED_WAITING on hudson.remoting.UserRequest@33f1c9bd
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.UserRequest@33f1c9bd
    at hudson.remoting.Request.call(Request.java:147)
    at hudson.remoting.Channel.call(Channel.java:796)
    at hudson.FilePath.act(FilePath.java:1007)
    at hudson.FilePath.act(FilePath.java:996)
    at hudson.FilePath.deleteRecursive(FilePath.java:1198)
    at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:934)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1278)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
    at hudson.model.Run.execute(Run.java:1720)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:404)

ThreadDump from that server:

Channel reader thread: channel

"Channel reader thread: channel" Id=2301 Group=main RUNNABLE (in native)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)

  • locked java.io.BufferedInputStream@40bda018
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
    at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)

main

"main" Id=1 Group=main WAITING on hudson.remoting.Engine@a3d1956
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.Engine@a3d1956
    at java.lang.Thread.join(Unknown Source)
    at java.lang.Thread.join(Unknown Source)
    at hudson.remoting.jnlp.Main.main(Main.java:150)
    at hudson.remoting.jnlp.Main._main(Main.java:143)
    at hudson.remoting.Launcher.run(Launcher.java:231)
    at hudson.remoting.Launcher.main(Launcher.java:195)

Ping thread for channel hudson.remoting.Channel@662535d4:channel

"Ping thread for channel hudson.remoting.Channel@662535d4:channel" Id=2302 Group=main TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at hudson.remoting.PingThread.run(PingThread.java:91)

pool-1-thread-2253 for channel

"pool-1-thread-2253 for channel" Id=2298 Group=main RUNNABLE
at com.sun.jna.Native.initIDs(Native Method)
at com.sun.jna.Native.<clinit>(Native.java:148)
at hudson.util.jna.Kernel32Utils.load(Kernel32Utils.java:112)
at hudson.util.jna.Kernel32.<clinit>(Kernel32.java:37)
at hudson.util.jna.Kernel32Utils.getWin32FileAttributes(Kernel32Utils.java:77)
at hudson.util.jna.Kernel32Utils.isJunctionOrSymlink(Kernel32Utils.java:98)
at hudson.Util.isSymlink(Util.java:510)
at hudson.FilePath.deleteRecursive(FilePath.java:1221)
at hudson.FilePath.access$1000(FilePath.java:195)
at hudson.FilePath$14.invoke(FilePath.java:1201)
at hudson.FilePath$14.invoke(FilePath.java:1198)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)

Number of locked synchronizers = 1

  • java.util.concurrent.ThreadPoolExecutor$Worker@75d956c4

pool-1-thread-2256 for channel

"pool-1-thread-2256 for channel" Id=2309 Group=main RUNNABLE
at com.sun.jna.Pointer.<clinit>(Pointer.java:41)
at com.sun.jna.Structure.<clinit>(Structure.java:2078)
at org.jvnet.hudson.Windows.monitor(Windows.java:42)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:124)
at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)

Number of locked synchronizers = 1

  • java.util.concurrent.ThreadPoolExecutor$Worker@756a3221

pool-1-thread-2261 for channel

"pool-1-thread-2261 for channel" Id=2314 Group=main RUNNABLE
at sun.management.ThreadImpl.dumpThreads0(Native Method)
at sun.management.ThreadImpl.dumpAllThreads(Unknown Source)
at hudson.Functions.getThreadInfos(Functions.java:1220)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:98)
at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:95)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
at java.lang.Thread.run(Unknown Source)

Number of locked synchronizers = 1

  • java.util.concurrent.ThreadPoolExecutor$Worker@50474da7

RemoteInvocationHandler 19

"RemoteInvocationHandler 19" Id=2300 Group=main TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@fec0bff
at java.lang.Object.wait(Native Method)

  • waiting on java.lang.ref.ReferenceQueue$Lock@fec0bff
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:564)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
    at java.lang.Thread.run(Unknown Source)

Thread-1

"Thread-1" Id=13 Group=main TIMED_WAITING on hudson.remoting.Channel@662535d4
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.Channel@662535d4
    at hudson.remoting.Channel.join(Channel.java:948)
    at hudson.remoting.Engine.run(Engine.java:316)

Attach Listener

"Attach Listener" Id=5 Group=system RUNNABLE

Finalizer

"Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@6641224e
at java.lang.Object.wait(Native Method)

  • waiting on java.lang.ref.ReferenceQueue$Lock@6641224e
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
    at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

Reference Handler

"Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@272dd0cb
at java.lang.Object.wait(Native Method)

  • waiting on java.lang.ref.Reference$Lock@272dd0cb
    at java.lang.Object.wait(Unknown Source)
    at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)

Signal Dispatcher

"Signal Dispatcher" Id=4 Group=system RUNNABLE

I've got 12 systems hung like this, all Windows 10 X64.

smithgcovert@gmail.com (JIRA)

未读,
2016年10月26日 21:44:072016/10/26
收件人 jenkinsc...@googlegroups.com

Your stack seems to be pretty similar to mine, in this bug:

https://issues.jenkins-ci.org/browse/JENKINS-39179

That bug is driving me and my team crazy. They could be the same.

smithgcovert@gmail.com (JIRA)

未读,
2016年10月26日 21:47:012016/10/26
收件人 jenkinsc...@googlegroups.com

Are all of your Windows 10 machines updated to the anniversary edition? For us, it is happening and locking up all builds. If you find the build with the JNI stack, you can kill that single slave, and all other slaves will be able to continue.

treadstoneit@gmail.com (JIRA)

未读,
2016年10月27日 12:05:022016/10/27
收件人 jenkinsc...@googlegroups.com

Yes all running Windows 10 Anniversary Edition. Luckily for us the issue remains isolated to specific nodes, and other jobs seem to run unimpeded. As I said above, downgrading to 2.7.4 made the issue disappear for me!

dbeck@cloudbees.com (JIRA)

未读,
2016年10月27日 16:10:012016/10/27
收件人 jenkinsc...@googlegroups.com

To clarify, all affected nodes run Windows 10 Anniversary Edition, but not all nodes running Windows 10 Anniversary Edition are affected by this issue (despite running the same kinds of jobs)?

treadstoneit@gmail.com (JIRA)

未读,
2016年10月27日 16:38:022016/10/27
收件人 jenkinsc...@googlegroups.com

That appears to be the case, all affected nodes are Win10 AE, but not all of my Win10 AE nodes are affected. It doesn't look like the affliction is consistent: Sometimes jobs will run fine on these nodes, sometimes they won't.

smithgcovert@gmail.com (JIRA)

未读,
2016年10月27日 21:05:012016/10/27
收件人 jenkinsc...@googlegroups.com

I believe the difference between my environment (where this hang causes all builds to stall) and your environment (where only that single node stalls) - is that we are using Pipeline builds, where you are using freestyle builds.

As part of the code for a pipeline build runs on the master, this hang seems to block all other activity on the master. That makes it a bit worse in the scenario where Pipeline is being used – but either way, its very bad.

smithgcovert@gmail.com (JIRA)

未读,
2016年10月28日 00:29:012016/10/28
收件人 jenkinsc...@googlegroups.com

We were able to recreate this problem with per-anniversary edition of Windows 10, so its not related to that.

It looks like this might be related to https://issues.jenkins-ci.org/browse/JENKINS-19445

o.v.nenashev@gmail.com (JIRA)

未读,
2016年11月10日 16:42:012016/11/10
收件人 jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

未读,
2016年11月10日 16:43:012016/11/10
收件人 jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
Change By: Oleg Nenashev
Component/s: perforce-plugin

treadstoneit@gmail.com (JIRA)

未读,
2017年3月9日 16:33:012017/3/9
收件人 jenkinsc...@googlegroups.com
Emory Penney commented on Bug JENKINS-38834
 
Re: Freestyle jobs hang in 2.19.1 on Windows 10 Nodes

I tried 2.32.3 with Oracle JDK8 today, issue persists.

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo
回复全部
回复作者
转发
0 个新帖子