| After the last restart on January 22, one of my Jenkins masters is still leaking flyweight executors. It hasn't quite gotten up to the 1,000s yet as it did last time. There are 420 flyweight executors right now (and this number is increasing), but there are only about 30 running builds that are visible in the UI. This means hundreds of flyweight executors have been leaked. Last night, this resulted in a huge burst in the # of threads and CPU usage with dozens of stacks like this:
"jenkins.util.Timer [#4]" #77 daemon prio=5 os_prio=0 tid=0x00007f50a800e800 nid=0x80d runnable [0x00007f504788a000]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at hudson.model.Executor.getCurrentExecutable(Executor.java:514)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.buildsOnExecutor(ThrottleQueueTaskDispatcher.java:511)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.buildsOfProjectOnNode(ThrottleQueueTaskDispatcher.java:488)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.buildsOfProjectOnAllNodes(ThrottleQueueTaskDispatcher.java:501)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.throttleCheckForCategoriesAllNodes(ThrottleQueueTaskDispatcher.java:281)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRunImpl(ThrottleQueueTaskDispatcher.java:253)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:218)
at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:176)
at hudson.model.Queue.getCauseOfBlockageForItem(Queue.java:1197)
at hudson.model.Queue.maintain(Queue.java:1522)
We were burning through CPU iterating through these leaked flyweight executors from the Throttle Concurrent Builds plugin. This issue would go away if the flyweight executors weren't leaked. After restarting the master, things are back to normal, but the leak grows again. It seems to take about 20 days for the leaked executors to start causing serious problems in my environment. Devin Nusbaum, what do you suggest as the next steps here? I see this bug has been resolved as "incomplete", but this issue occurred on January 22 and February 12, and I'm sure it will occur again in 20 days or so after I restart this master. While I don't have a simple reproducer, I do have an environment on which this issue occurs regularly. I can help collect any debugging state that is needed. Please let me know if I can add any additional information to this bug (or a new bug). |