Wow, can't believe I didn't know about /threadDump before, thanks!
Anyways I mostly see lots (on the order of 100s) of:
"Computer.threadPoolForRemoting [#85811]" Id=380625 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@687826ae
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.SynchronousQueue$TransferStack@687826ae
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I did see some (~20) CPS executors that looked suspicious (they're also hanging forever when i go to the job):
"Running CpsFlowExecution[Owner[workflow_test/619:workflow_test#619]]" Id=7310 Group=main RUNNABLE
at org.kohsuke.stapler.framework.io.LargeText$BufferSession.skip(LargeText.java:532)
at org.kohsuke.stapler.framework.io.LargeText.writeLogTo(LargeText.java:211)
at hudson.console.AnnotatedLargeText.writeRawLogTo(AnnotatedLargeText.java:162)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:359)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:107)
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:752)
- locked java.util.concurrent.atomic.AtomicBoolean@7eb2df49
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:799)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$4.run(CpsThreadGroup.java:320)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@d3b1e0f
I noticed them before but I assumed they weren't related to this, issue, I think they're waiting for slaves nodes that have been removed from our jenkins (not just disconnected but deleted), I'll clean these up with the /kill url on workflows and see if that improves things since the threaddump makes me wonder if these are related.
Thanks again