[JIRA] (JENKINS-58899) ThreadLocal leak causing high CPU usage

2 views
Skip to first unread message

glavoie@gmail.com (JIRA)

unread,
Aug 12, 2019, 1:24:02 PM8/12/19
to jenkinsc...@googlegroups.com
Gabriel Lavoie created an issue
 
Jenkins / Bug JENKINS-58899
ThreadLocal leak causing high CPU usage
Issue Type: Bug Bug
Assignee: Unassigned
Components: workflow-api-plugin
Created: 2019-08-12 17:23
Environment: Jenkins 2.187
Linux
openjdk version "1.8.0_222"
Priority: Minor Minor
Reporter: Gabriel Lavoie

Our large Jenkins instance had a CPU usage spike last weekend.

After some digging, we found that all `org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep` threads tended to get stuck in `java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry` with various stacks, which seems to be the consequence of a ThreadLocal leak:

 

"org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#12]: checking <redacted> on i-0db24f1a6b300ea1f" #3845 daemon prio=5 os_prio=0 tid=0x00007f2dd8bd2800 nid=0x30bb runnable [0x00007f2d2a926000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
        at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
        at java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
        at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
        at hudson.remoting.Channel.setCurrent(Channel.java:1787)
        at hudson.remoting.UserRequest$NormalResponse.retrieve(UserRequest.java:328)
        at hudson.remoting.Channel.call(Channel.java:957)
        at hudson.FilePath.act(FilePath.java:1072)
        at hudson.FilePath.act(FilePath.java:1061)
        at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:209)
        at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:532)
        at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:489)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

 

 

Digged the content of the ThreadLocalMap of each of the live threads of the `ScheduledThreadPoolExecutor` and got something similar:

 

Thread 3841 org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#8]: checking <redacted> on i-04a6c09bbdc596bbd
Map size: 65536
[org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream$1:20704, java.lang.ThreadLocal:47, hudson.remoting.Channel$ThreadLastIoId:34, null:42550, org.codehaus.groovy.runtime.NullObject:2200, java.lang.ThreadLocal$SuppliedThreadLocal:1]

 

It looks like the following ThreadLocal is never cleared since it gets attached to reused threads from a thread pool: 

https://github.com/jenkinsci/workflow-api-plugin/blob/907cf64feb2f38e93aebbedf87946b0235c3dc93/src/main/java/org/jenkinsci/plugins/workflow/log/DelayBufferedOutputStream.java#L123

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

glavoie@gmail.com (JIRA)

unread,
Aug 12, 2019, 1:25:01 PM8/12/19
to jenkinsc...@googlegroups.com
Gabriel Lavoie updated an issue
Change By: Gabriel Lavoie
Our large Jenkins instance had a CPU usage spike last weekend.

After some digging, we found that all `org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep` threads tended to get stuck in `java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry` with various stacks, which seems to be the consequence of a ThreadLocal leak:

 
{noformat}

"org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#12]: checking <redacted> on i-0db24f1a6b300ea1f" #3845 daemon prio=5 os_prio=0 tid=0x00007f2dd8bd2800 nid=0x30bb runnable [0x00007f2d2a926000]
  java.lang.Thread.State: RUNNABLE
       at java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
       at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
       at java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
       at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
       at hudson.remoting.Channel.setCurrent(Channel.java:1787)
       at hudson.remoting.UserRequest$NormalResponse.retrieve(UserRequest.java:328)
       at hudson.remoting.Channel.call(Channel.java:957)
       at hudson.FilePath.act(FilePath.java:1072)
       at hudson.FilePath.act(FilePath.java:1061)
       at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:209)
       at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:532)
       at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:489)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748){noformat}

 

 

Digged the content of the ThreadLocalMap of each of the live threads of the `ScheduledThreadPoolExecutor` and got something similar:

 
{noformat}

Thread 3841 org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#8]: checking <redacted> on i-04a6c09bbdc596bbd
Map
ThreadLocalMap
size: 65536

Elements count per key type:
[org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream$1:20704, java.lang.ThreadLocal:47, hudson.remoting.Channel$ThreadLastIoId:34, null:42550, org.codehaus.groovy.runtime.NullObject:2200, java.lang.ThreadLocal$SuppliedThreadLocal:1]{noformat}

 

It looks like the following ThreadLocal is never cleared since it gets attached to reused threads from a thread pool: 

Reply all
Reply to author
Forward
0 new messages