[JIRA] (JENKINS-51057) EventDispatcher and ConcurrentLinkedQueue ate my JVM

docwhat@gerf.org (JIRA)

unread,

May 1, 2018, 3:37:02 PM5/1/18

to jenkinsc...@googlegroups.com

Christian Höltje created an issue

Jenkins /

JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

Issue Type:	Bug
Assignee:	Tom FENNELLY
Components:	sse-gateway-plugin
Created:	2018-05-01 19:36
Environment:	Jenkins 2.107.2 ace-editor-1.1 analysis-core-1.95 ansicolor-0.5.2 ant-1.8 antisamy-markup-formatter-1.5 apache-httpcomponents-client-4-api-4.5.3-2.1 artifactory-2.15.1 authentication-tokens-1.3 badge-1.4 blueocean-1.5.0 blueocean-autofavorite-1.2.2 blueocean-bitbucket-pipeline-1.5.0 blueocean-commons-1.5.0 blueocean-config-1.5.0 blueocean-core-js-1.5.0 blueocean-dashboard-1.5.0 blueocean-display-url-2.2.0 blueocean-events-1.5.0 blueocean-git-pipeline-1.5.0 blueocean-github-pipeline-1.5.0 blueocean-i18n-1.5.0 blueocean-jira-1.5.0 blueocean-jwt-1.5.0 blueocean-personalization-1.5.0 blueocean-pipeline-api-impl-1.5.0 blueocean-pipeline-editor-1.5.0 blueocean-pipeline-scm-api-1.5.0 blueocean-rest-1.5.0 blueocean-rest-impl-1.5.0 blueocean-web-1.5.0 bouncycastle-api-2.16.2 branch-api-2.0.19 build-timeout-1.19 build-user-vars-plugin-1.5 buildtriggerbadge-2.9 checkstyle-3.50 cli-commander-0.3 cloudbees-bitbucket-branch-source-2.2.10 cloudbees-folder-6.4 clover-4.9.0 cobertura-1.12 command-launcher-1.2 conditional-buildstep-1.3.6 config-file-provider-2.18 copyartifact-1.39.1 credentials-2.1.16 credentials-binding-1.16 cucumber-reports-3.16.0 cvs-2.14 dashboard-view-2.9.11 display-url-api-2.2.0 docker-commons-1.11 docker-workflow-1.15.1 durable-task-1.22 email-ext-2.62 embeddable-build-status-1.9 envinject-2.1.5 envinject-api-1.5 extended-read-permission-2.0 external-monitor-job-1.7 favorite-2.3.1 findbugs-4.72 git-3.8.0 git-client-2.7.1 git-server-1.7 github-1.29.0 github-api-1.90 github-branch-source-2.3.3 github-oauth-0.29 gradle-1.28 groovy-2.0 groovy-postbuild-2.4 handlebars-1.1.1 handy-uri-templates-2-api-2.1.6-1.0 htmlpublisher-1.16 http_request-1.8.22 ibm-application-security-1.1.2 icon-shim-2.0.3 ivy-1.28 jackson2-api-2.8.11.1 jacoco-3.0.1 javadoc-1.4 jenkins-design-language-1.5.0 jira-2.5.1 job-dsl-1.69 jquery-detached-1.2.1 jsch-0.1.54.2 junit-1.24 ldap-1.20 lockable-resources-2.2 mailer-1.21 mapdb-api-1.0.9.0 matrix-auth-2.2 matrix-project-1.13 maven-plugin-3.1.2 mercurial-2.3 metrics-3.1.2.11 metrics-graphite-3.0.0 modernstatus-1.2 momentjs-1.1.1 monitoring-1.72.0 pam-auth-1.3 parameterized-trigger-2.35.2 pegdown-formatter-1.3 performance-3.8 pipeline-build-step-2.7 pipeline-github-2.0 pipeline-github-lib-1.0 pipeline-graph-analysis-1.6 pipeline-input-step-2.8 pipeline-maven-3.5.3 pipeline-milestone-step-1.3.1 pipeline-model-api-1.2.9 pipeline-model-declarative-agent-1.1.1 pipeline-model-definition-1.2.9 pipeline-model-extensions-1.2.9 pipeline-rest-api-2.10 pipeline-stage-step-2.3 pipeline-stage-tags-metadata-1.2.9 pipeline-stage-view-2.10 pipeline-utility-steps-2.0.2 plain-credentials-1.4 pmd-3.50 pubsub-light-1.12 rebuild-1.28 resource-disposer-0.8 run-condition-1.0 scm-api-2.2.6 script-security-1.44 sidebar-link-1.9.1 simple-theme-plugin-0.4 slack-2.3 sse-gateway-1.15 ssh-agent-1.15 ssh-credentials-1.13 ssh-slaves-1.26 structs-1.14 subversion-2.10.5 support-core-2.46 swarm-3.12 tap-2.2.1 timestamper-1.8.9 token-macro-2.5 translation-1.16 variant-1.1 view-job-filters-1.27 webhook-step-1.3 windows-slaves-1.3.1 workflow-aggregator-2.5 workflow-api-2.27 workflow-basic-steps-2.6 workflow-cps-2.48 workflow-cps-global-lib-2.9 workflow-durable-task-step-2.19 workflow-job-2.19 workflow-multibranch-2.17 workflow-scm-step-2.6 workflow-step-api-2.14 workflow-support-2.18 ws-cleanup-0.34 xunit-1.102
Priority:	Major
Reporter:	Christian Höltje

We started running out of memory in our JVM (Xmx 8G) and when looking at Melody's memory (heap) histogram (JENKINS_URL/monitoring?part=heaphisto) the top two items were:

Class	Size (Kb)	% size	Instances	% instances	Source
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry	2,890,399	44	92,492,793	43
java.util.concurrent.ConcurrentLinkedQueue$Node	2,167,981	33	92,500,553	43

77% (and growing as we were researching the problem) of the memory was being used by these two items.

I have two support bundles from this time and an .hprof as well.

I can either screen share with someone or if you can tell me how to analyze these files I would be happy to.

Add Comment

This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 11:52:02 AM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Our Jenkins server has been up 23 hours and we're already seeing large numbers of the EventDispacher objects:

Class	Size (Kb)	% size	Instances	% instances	Source

[int[]	https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=int]	1,056,446	26	868,075	1
[char[]	https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=char]	447,243	11	2,837,232	4
[java.lang.Object[]	https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=java.lang.Object]	250,560	6	4,813,216	7
java.lang.StackTraceElement	239,176	6	7,653,653	11
[byte[]	https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=byte]	165,917	4	581,581	0
java.lang.String	146,722	3	6,260,161	9
org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry	96,964	2	3,102,853	4
java.util.concurrent.ConcurrentLinkedQueue$Node	73,163	1	3,121,633	4

It isn't a problem (yet) but this is alarming.

Our other, Jenkins server has no references to EventDispatcher$Retry in the memory histogram, even when expanding details.

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 12:12:02 PM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Our Jenkins server has been up 23 hours and we're already seeing large numbers of the EventDispacher objects:

|| Class|| Size (Kb)|| % size|| Instances|| % instances|| Source||
|[

int[]|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=int]|1,056,446|26|868,075|1| |

|[char[]|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=char]|447,243|11|2,837,232|4| |
|[java.lang.Object[]|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=java.lang.Object]|250,560|6|4,813,216|7| |

|[java.lang.StackTraceElement|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=java.lang.StackTraceElement]|239,176|6|7,653,653|11| |

|[byte[]|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=byte]|165,917|4|581,581|0| |

|[java.lang.String|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=java.lang.String]|146,722|3|6,260,161|9| |
|[ org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry]|96,964|2|3,102,853|4| |
|[java.util.concurrent.ConcurrentLinkedQueue$Node|https://ghenkins.bigdatalab.ibm.com/monitoring?part=source&class=java.util.concurrent.ConcurrentLinkedQueue$Node]|73,163|1|3,121,633|4| |

It isn't a problem (yet) but this is alarming.

Our other, Jenkins server has no references to {{EventDispatcher$Retry}} in the memory histogram, even when expanding details.

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 12:20:03 PM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

In our logs, we're see messages like:

 
                                                                May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
Invalid SSE unsubscribe configuration. No active subscription matching filter: 
May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
Invalid SSE unsubscribe configuration. No active subscription matching filter: 
May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
Invalid SSE unsubscribe configuration. No active subscription matching filter: 
May 02, 2018 1:04:41 PM WARNING org.jenkinsci.plugins.ssegateway.sse.EventDispatcher unsubscribe
Invalid SSE unsubscribe configuration. No active subscription matching filter:  
                                                            

Could that be related?

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 12:39:01 PM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

And now its at:

Class	Size (Kb)	% size	Instances	% instances	Source

org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry	105,174	3	3,365,570	4
java.util.concurrent.ConcurrentLinkedQueue$Node	79,051	2	3,372,854	4

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 1:16:03 PM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

There are a lot of changes/improvements in sse-gateway (mostly by R. Tyler Croy) since 1.15 was released (back in Jan 16th, 2017).

In the logs, I see a commit titled message leaks which sounds interesting.

I tried compiling the master branch and I get errors:

 
                                                                $ docker run --rm -it -v maven-repo:/root/.m2 -v $PWD:/src:rw -w /src maven:3-jdk-8-alpine mvn verify 
...
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.0:npm (npm install) on project sse-gateway: Failed to run task: 'npm install' failed. java.io.IOException: Cannot run program "/src/node/node" (in directory "/src"): error=2, No such file or directory -> [Help 1]

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 1:30:02 PM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

I got further when using the non-alpine openjdk image. Still doesn't build, but it is due to tests failing.

 
                                                                $ docker run --rm -it -v maven-repo:/root/.m2 -v $PWD:/src:rw -w /src maven:3-jdk-8 mvn verify
...
Results :
Failed tests:
org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest.test_autoDeleteOnExpire(org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest)
 Run 1: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
 Run 2: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
 Run 3: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
 Run 4: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
 Run 5: EventHistoryStoreTest.test_autoDeleteOnExpire:119 expected:<100> but was:<50>
org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest.test_delete_stale_events(org.jenkinsci.plugins.ssegateway.EventHistoryStoreTest)
 Run 1: EventHistoryStoreTest.test_delete_stale_events:69
 Run 2: EventHistoryStoreTest.test_delete_stale_events:69
 Run 3: EventHistoryStoreTest.test_delete_stale_events:69
 Run 4: EventHistoryStoreTest.test_delete_stale_events:69
 Run 5: EventHistoryStoreTest.test_delete_stale_events:69

Tests run: 12, Failures: 2, Errors: 0, Skipped: 0
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on project sse-gateway: There are test failures.
...
 
                                                            

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 3, 2018, 1:44:03 PM5/3/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Some more events from our logs:

 
                                                                May 03, 2018 1:36:00 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher doDispatch
Error dispatching event to SSE channel. Write failed.
May 03, 2018 1:36:36 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher doDispatch
Error dispatching event to SSE channel. Write failed.
May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries
Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612).
May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries
Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612).
May 03, 2018 1:36:39 PM FINE org.jenkinsci.plugins.ssegateway.sse.EventDispatcher processRetries
Error dispatching retry event to SSE channel. Write failed. Dispatcher jenkins-blueocean-core-js-1525286577663-o9g92 (1006446612). 
                                                            

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 4, 2018, 9:08:02 AM5/4/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

This morning:

Class	Size (Kb)	% size	Instances	% instances	Source

org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry	283,979	5	9,087,330	10
java.util.concurrent.ConcurrentLinkedQueue$Node	214,241	4	9,140,981	10

Is there a way I can find out what's in the queues to track what's causing the Retry to happen?

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 4, 2018, 10:40:02 AM5/4/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Looking in $JENKINS_HOME/logs/sse-events/ I see 16 .json files in jobs and 108 .json files in pipeline. And they change frequently (e.g. they are now, a minute later, 6 and 54).

Is there another place I can look at why there are some many Retry objects?

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 7, 2018, 4:09:02 PM5/7/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

I managed to use jvirtualvm to look at the .hprof from the initial comment and it was our friend EventDispatcher$Retry and $ConcurrentLinkedQueue$Node using up 90% of the memory.

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 11, 2018, 2:28:02 PM5/11/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

I finally got VirtualVM working reasonably well (using lots of RAM and a RAM Disk for temp files).

I found that there are 0 instances of org.jenkinsci.plugins.ssegateway.EventHistoryStore and 0 instances of org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.

Is it possible that the Retry objects are not being reclaimed by GC?

When I look at the objects in VirtualVM the Retry object has a reference (item) to a ConcurrentLinkedQueue$Node object.

The Node object has a reference (item) to the Retry object plus a reference to the next {{Node} in the queue.

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 11, 2018, 2:28:02 PM5/11/18

to jenkinsc...@googlegroups.com

Christian Höltje updated an issue

Jenkins /

JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

Change By:	Christian Höltje
Labels:	leak memory memory-leak

Add Comment

docwhat@gerf.org (JIRA)

unread,

May 11, 2018, 3:45:02 PM5/11/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Since I didn't see this problem in Blue Ocean 1.4.2 but I'm seeing it in 1.5.0 I went looking for changes in Blue Ocean that may have triggered this problem...

I found PR #1667 by Vivek Pandey. It was to fix ~~JENKINS-48820~~. I doubt it caused this problem, but maybe it made it worse?

Add Comment

me@wilfred.me.uk (JIRA)

unread,

Jun 1, 2018, 5:23:01 AM6/1/18

to jenkinsc...@googlegroups.com

Wilfred Hughes edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

We're seeing this problem with Blue Ocean 1.4.2 and Jenkins 2 . 89.4.

Add Comment

me@wilfred.me.uk (JIRA)

unread,

Jun 1, 2018, 5:23:01 AM6/1/18

to jenkinsc...@googlegroups.com

Wilfred Hughes commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

We're seeing this problem with Blue Ocean 1.4.2.

Add Comment

docwhat@gerf.org (JIRA)

unread,

Jun 25, 2018, 2:04:02 PM6/25/18

to jenkinsc...@googlegroups.com

Christian Höltje commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

The only workaround I found was to completely uninstall Blue Ocean and the pubsub plugins.

Add Comment

This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)

sternr@windowslive.com (JIRA)

unread,

Jul 5, 2018, 7:08:02 AM7/5/18

to jenkinsc...@googlegroups.com

rami stern commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

We think we might've found the root cause, we're checking it now on prod.

https://github.com/jenkinsci/sse-gateway-plugin/pull/27

Add Comment

gustaf.lundh@gmail.com (JIRA)

unread,

Aug 30, 2018, 4:53:02 AM8/30/18

to jenkinsc...@googlegroups.com

Gustaf Lundh commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

We just had to restart a critical Jenkins master due to this memory leak, where org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry and java.util.concurrent.ConcurrentLinkedQueue$Node consumed 20GB of our 40GB heap. We need to get rid of blueocean due to this issue. The priority should be raised to critical IMHO.

Add Comment

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jon.sten@gmail.com (JIRA)

unread,

Aug 31, 2018, 8:09:03 AM8/31/18

to jenkinsc...@googlegroups.com

Jon Sten commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

As a temporary workaround, here's a script which will go through and look for old event dispatchers:

 
                                                                // First of, due to groovy and private field in super classes, we need to
// change visibility of retryQueue so that we can use reflection instead...
def retryQueueField = org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.getDeclaredField('retryQueue')
retryQueueField.setAccessible(true)

def sessions = Jenkins.instance.servletContext.this$0._sessionHandler._sessionCache._sessions
println("There are ${sessions.size()} sessions...")
def numEventsRemoved = 0
def numEventDispatchersPurged = 0
sessions.each{id, session->
  def eventDispatchers = session.sessionData._attributes['org.jenkinsci.plugins.ssegateway.sse.EventDispatcher']
  if (eventDispatchers) {
    println("Sessions ${id} has a ssegateway EventDispatcher map...")
    eventDispatchers.findAll{k, v -> k.startsWith('jenkins-blueocean-core-js')}.each { dispatcherId, dispatcher ->
      def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...
      if (retryQueue.isEmpty()) {
        println("  Found one EventDispatcher, '${dispatcherId}', it has no retry events.")
      } else {
        def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000
        if (oldestAge > 300) {
          println("  Found one EventDispatcher, '${dispatcherId}', its oldest retry event is ${oldestAge} seconds old, it contains ${retryQueue.size()} retry events, removing events and unsubscribing.")
          numEventsRemoved += retryQueue.size()
          numEventDispatchersPurged++
          dispatcher.unsubscribeAll()
          retryQueue.clear()
     	 } else {
        println("  Found one EventDispatcher, its oldest retry event is ${oldestAge} seconds old, so sparing it for now...")
      	}
      }
    }
  }
}

println("Removed ${numEventsRemoved} retry events from ${numEventDispatchersPurged} EventDispatchers!")
 
                                                            

Btw, I've verified this bug in a fresh Jenkins install (2.121.3) and with latest version of Blue Ocean (1.8.2) which pulls SSE-gateway (1.15).

Add Comment

mstewart@riotgames.com (JIRA)

unread,

Sep 5, 2018, 10:00:01 PM9/5/18

to jenkinsc...@googlegroups.com

Maxfield Stewart updated an issue

Jenkins /

JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

Change By:	Maxfield Stewart
Attachment:	image-2018-09-05-18-59-08-587.png

Add Comment

mstewart@riotgames.com (JIRA)

unread,

Sep 5, 2018, 10:00:04 PM9/5/18

to jenkinsc...@googlegroups.com

Maxfield Stewart commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

We're having this problem with blue ocean 1.4.2 and Jenkins 2.112. Out of a 18 GB heap, 16GB are the GuavaPubSub object which containes the concurrent queue and EventDispatchers. We ran Jon Sten 's script and it cleaned up only 2.5GB's of retry objects but still had 14GB's of references to EventDispatchers. It takes' out our Jenkins Server in about 14-15 days.

I took a heap dump and am adding a screenshot of a MAT analyzer view of the memory consumption. Was hoping someone would accept the PR for the memory leak fix which seems to be outstanding for some time now.

Add Comment

g.sironi@elifesciences.org (JIRA)

unread,

Oct 18, 2018, 8:30:03 AM10/18/18

to jenkinsc...@googlegroups.com

Giorgio Sironi commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Experiencing the same problem, it takes ~30 days for Jenkins to build up to its VM memory limit. We are on SSE Gateway 1.15 too.

Heap Classes: 10,648, Instances: 57,541,468, Kilo-Bytes: 1,715,402

Class	Size (Kb)	% size	Instances	% instances	Source

org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry	800,661	46	25,621,179	44
java.util.concurrent.ConcurrentLinkedQueue$Node	600,538	35	25,622,966	44
char[]	125,236	7	1,298,894	2
java.lang.String	30,420	1	1,297,943	2
short[]	20,271	1	9,119	0

Add Comment

msa@rtx.dk (JIRA)

unread,

Nov 6, 2018, 5:21:03 AM11/6/18

to jenkinsc...@googlegroups.com

Mikkel S. Andersen commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Bump. Same issue here on SSE Gateway 1.16 it seems.

Any chance for a fix in the near future?

Heap Classes: 20,087, Instances: 280,652,984, Kilo-Bytes: 9,119,689

Class	Size (Kb)	% size	Instances	% instances	Source

org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$Retry	3,383,758	37	108,280,267	38
java.util.concurrent.ConcurrentLinkedQueue$Node	2,538,166	27	108,295,118	38
char[]	1,075,470	11	9,044,530	3
java.lang.Object[]	205,241	2	5,771,490	2
int[]	182,430	2	1,633,983	0
byte[]	165,738	1	558,281	0
java.lang.String	157,701	1	6,728,615	2
java.util.HashMap$Node	99,299	1	3,177,591	1

Add Comment

batmat@batmat.net (JIRA)

unread,

Jan 22, 2019, 8:10:03 AM1/22/19

to jenkinsc...@googlegroups.com

Baptiste Mathus assigned an issue to Unassigned

Jenkins /

JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

Change By:	Baptiste Mathus
Assignee:	Tom FENNELLY

Add Comment

mstewart@riotgames.com (JIRA)

unread,

Mar 5, 2019, 10:15:01 PM3/5/19

to jenkinsc...@googlegroups.com

Maxfield Stewart commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

This problem continues. It continues to take production Jenkins offline forcing restarts every 15-25 days. We're beginning to see if there's a way to deploy jenkins with Zero Blue Ocean elements as it's not production ready. Though the plugin seems to install by default. it's a shame there's been zero ownership of this for almost a year now

Add Comment

airadier@gmail.com (JIRA)

unread,

Mar 7, 2019, 5:45:02 AM3/7/19

to jenkinsc...@googlegroups.com

Álvaro Iradier commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Just my 2 cents...

Jon Sten, I tried your cleanup script with mixed results. It was able to empty retryQueues but we still had a big leak. After isolating the problem, I noticed that if the HTTP session from the user expires, that script is not able to find the leaking EventDispatchers. They are, however, still in the GuavaPubSubBus subscribers map, and can be seen with:

 
                                                                import org.jenkinsci.plugins.pubsub.PubsubBus;
this.bus = PubsubBus.getBus();
this.bus.subscribers.each{ channelSubscriber, guavaSubscriber -> println channelSubscriber }
println "Done"

This outputs some lines containing:

org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$SSEChannelSubscriber@3472875b

which is the Inner class that implements the ChannelSubscriber interface. I suspect that the memory is still leaking as it is being referenced through this subscriber.

I saw there is already a PR https://github.com/jenkinsci/sse-gateway-plugin/pull/27 and a fork of the plugin https://github.com/taboola/sse-gateway-plugin

The approach of this fix is changing the handling of retries so they are aborted and the queue cleared after some amount of time or retries. Howevery, I wonder if the problem is the cleanup should be done when the HttpSession is destroyed. Currently, I can see in https://github.com/jenkinsci/sse-gateway-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ssegateway/sse/EventDispatcher.java the cleanup code for sessionDestroyed is:

 
                                                                    /**
     * Http session listener.
     */
    @Extension
    public static final class SSEHttpSessionListener extends HttpSessionListener {
        @Override
        public void sessionDestroyed(HttpSessionEvent httpSessionEvent) {
            try {
                Map<String, EventDispatcher> dispatchers = EventDispatcherFactory.getDispatchers(httpSessionEvent.getSession());
                try {
                    for (EventDispatcher dispatcher : dispatchers.values()) {
                        try {
                            dispatcher.unsubscribeAll();
                        } catch (Exception e) {
                            LOGGER.log(Level.FINE, "Error during unsubscribeAll() for dispatcher " + dispatcher.getId() + ".", e);
                        }
                    }
                } finally {
                    dispatchers.clear();
                }
            } catch (Exception e) {
                LOGGER.log(Level.FINE, "Error during session cleanup. The session has probably timed out.", e);
            }
        }
    }
 
                                                            

but although I can see that for every dispatcher there is a call to dispatcher.unsubscribeAll(), I am missing a call to retryQueue.clear(); But I am just guessing here...

Anyone knowing the internals of the plugin can confirm this? I might try forking the plugin and testing a modified version.

Regards.

Add Comment

airadier@gmail.com (JIRA)

unread,

Mar 8, 2019, 2:41:03 PM3/8/19

to jenkinsc...@googlegroups.com

Álvaro Iradier commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

After many tests and cleanup scripts, we still we able to find some EventDispatcher$Retry elements in the heap but no subscribers in the GuavaPubSubBus or EventDispatchers in the HttpSessions. Some additional verifications in the plugin made me notice that as AsyncEventDispatcher was being used, it is possible that the EventDispatchers were still being referenced by some asyncContexts or threads that were not completed or released. Finally I also added a dispatcher.stop() call in my cleanup script, and it looks like there are no traces of leaked EventDispatcher$Retry clases in the heap anymore, but we still keep observing our instance and analyzing heap dumps, it is very soon to confirm.

Just in case it can help, this is the cleanup script we are running daily:

 
                                                                import org.jenkinsci.plugins.pubsub.PubsubBus;
import org.jenkinsci.plugins.ssegateway.sse.*;

def dryRun = false
this.bus = PubsubBus.getBus();

// change visibility of retryQueue so that we can use reflection instead...
def retryQueueField = org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.getDeclaredField('retryQueue')
retryQueueField.setAccessible(true

)

def dispatcherCount = 0
def dispatchersList = []

//Build a list of EventDispatchers in all existing HTTP sessions
println "DISPATCHERS IN HTTP SESSIONS"
println "----------------------------"
def sessions = Jenkins.instance.servletContext.this$0._sessionHandler._sessionCache._sessions
sessions.each{id, session->
  def eventDispatchers = EventDispatcherFactory.getDispatchers(session)
  if (eventDispatchers) {
    eventDispatchers.each { dispatcherId, dispatcher ->
      dispatchersList.add(dispatcherId)
      def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...
      if (retryQueue.peek() != null) {
        def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000
        println "Dispatcher: " + dispatcher.getClass().getName() + " - " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."      
      } else {
        println "Dispatcher: " + dispatcher.getClass().getName() + " - " + dispatcher.id + " with no retryEvents"
      }      
    }
  }
}

println "There are " + dispatchersList.size() + " dispatchers in HTTP sessions"
println ""

//Find all subscribers in bus
println "DISPATCHERS IN PUBSUBBUS"
println "------------------------"
this.bus.subscribers.each{ channelSubscriber, guavaSubscriber ->
  if (channelSubscriber.getClass().getName().equals('org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$SSEChannelSubscriber')) {
    dispatcherCount++
    def dispatcher = channelSubscriber.this$0

    def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...
    if (retryQueue.peek() != null) {
      def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000
	  println "Dispatcher: " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."      
      if (oldestAge > 300) {
      	println "  Clearing retryQueue with " + retryQueue.size() + " events"
        if (!dryRun) {
        	retryQueue.clear()
        } else {
          println "  Ignoring, dryrun"
        }
      }
    } else {
	    println "Dispatcher: " + dispatcher.id + " with no retryEvents"
    }
    
    if (dispatchersList.indexOf(dispatcher.id) < 0) {
      println "  Dispatcher is not in HTTP Sessions. Clearing"
        if (!dryRun) {
          dispatcher.unsubscribeAll()
          dispatcher.stop()
        } else {
          println "  Ignoring, dryrun"
        }
    }
  } 
}

println ""
println "Dispatchers in PubSubBus: " + dispatcherCount
 
                                                            

Add Comment

jon.sten@gmail.com (JIRA)

unread,

Mar 11, 2019, 3:42:04 AM3/11/19

to jenkinsc...@googlegroups.com

Jon Sten commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Álvaro Iradier Thank you for the investigation. In our case I've only noticed EventDispatcher objects being kept alive by the HTTP sessions (for some unknown reason sessions aren't terminated, might be related to the LDAP plugin, not sure though). However, as you noted it seems like there are other places where the objects are kept referenced on the heap. So some kind of timeout of the dispatcher objects and sessions is really needed.

Add Comment

mstewart@riotgames.com (JIRA)

unread,

Mar 23, 2019, 12:23:03 AM3/23/19

to jenkinsc...@googlegroups.com

Maxfield Stewart commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Álvaro Iradier Thank you for the hard work, your script almost works on some of our servers. I had to modify the code to put an exception trap around "dispatcher.stop()" as in some cases that was blowing up on my server, after that it cleared memory.

However on another (older version) of Jenkins the above code explodes trying to access `def dispatcher = channelSubscriber.this$0` as the reflection property fails and doesn't return the parent class (Dispatcher). Not sure what's up there, but we're upgrading that server soon and I'll try again.

Add Comment

airadier@gmail.com (JIRA)

unread,

Mar 24, 2019, 4:46:02 PM3/24/19

to jenkinsc...@googlegroups.com

Álvaro Iradier edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

After many tests and cleanup scripts, we still we able to find some EventDispatcher$Retry elements in the heap but no subscribers in the GuavaPubSubBus or EventDispatchers in the HttpSessions. Some additional verifications in the plugin made me notice that as AsyncEventDispatcher was being used, it is possible that the EventDispatchers were still being referenced by some asyncContexts or threads that were not completed or released. Finally I also added a dispatcher.stop() call in my cleanup script, and it looks like there are no traces of leaked EventDispatcher$Retry clases in the heap anymore, but we still keep observing our instance and analyzing heap dumps, it is very soon to confirm.

Just in case it can help, this is the cleanup script we are running daily:

{code:java}

this.bus.subscribers. each any { channelSubscriber, guavaSubscriber ->

  if (channelSubscriber.getClass().getName().equals('org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$SSEChannelSubscriber')) {
    dispatcherCount++
    def dispatcher = channelSubscriber.this$0

    def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...
    if (retryQueue.peek() != null) {
      def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000
   println "Dispatcher: " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."
      if (oldestAge > 300) {
       println "  Clearing retryQueue with " + retryQueue.size() + " events"
        if (!dryRun) {
         retryQueue.clear()

            dispatcher.unsubscribeAll()
            try {
              dispatcher.stop()
        } catch (Exception ex) {
              println "  !! Exception stopping AsynDispatcher"
            }
        } else {

          println "  Ignoring, dryrun"
        }
      }
    } else {
     println "Dispatcher: " + dispatcher.id + " with no retryEvents"
    }

    if (dispatchersList.indexOf(dispatcher.id) < 0) {
      println "  Dispatcher is not in HTTP Sessions. Clearing"
        if (!dryRun) {
          dispatcher.unsubscribeAll()

          try {
          dispatcher.stop()
        } catch (Exception ex) {
            println "  !! Exception stopping AsynDispatcher"
          }
        } else {

          println "  Ignoring, dryrun"
        }
    }
  }
}

println ""
println "Dispatchers in PubSubBus: " + dispatcherCount

{code}

Add Comment

airadier@gmail.com (JIRA)

unread,

Mar 24, 2019, 4:50:03 PM3/24/19

to jenkinsc...@googlegroups.com

Álvaro Iradier commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Hi Maxfield Stewart, yes I also got the exception after some tests in our production environment, and had to wrap in try-catch. I just updated my comment with latest version of the script we are running. Probably the dispatcher.stop() is not necessary, as I mostly always get an exception, but it won't hurt. Also I noticed that the part in the " Dispatcher is not in HTTP Sessions. Clearing" was not executed most of the times, so I copied the cleanup part (unsubscribeAll() and the .stop()) to the "Clearing retryQueue with..." part.

After a couple of weeks running I examined the thread dumps and saw no trace of any leak regarding the EventDispatcher$Retry elements, so for us it is working. Looking forward for a fix in the official upstream version of the plugin

Add Comment

mstewart@riotgames.com (JIRA)

unread,

Mar 29, 2019, 1:08:01 PM3/29/19

to jenkinsc...@googlegroups.com

Maxfield Stewart commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Álvaro Iradier Confirmed by the way that your script needs a newer/newest version of plugins, not entirely sure which ones. Of 3 4 deployments I managed, 1 the script was not working for ( couldn't get the parent dispatcher via this$0 ) until I moved to the latest versions. Tested on Blue ocean 1.13 and 1.14, Jenkins v2.168. Doesn't seem to work on versions of Blue Ocean older than 1.10/Jenkins 2.147.

Add Comment

dan.mord@gmail.com (JIRA)

unread,

May 13, 2019, 8:31:03 AM5/13/19

to jenkinsc...@googlegroups.com

Dan Mordechai commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

@Álvaro Iradier Thank you for this script! I will definitely try it as we are facing the same issue on our 2.107.2 Jenkins.

Some things to point out:

I had to change the line

 
                                                                        def dispatcher = channelSubscriber.this$0

to

 
                                                                        def dispatcher = channelSubscriber.eventDispatcher

Your script will only work for SSE gatway plugin version <= 1.15 as the SSEChannelSubscriber inner class in EventDispatcher.java will not have the eventDispatcher member any more.

Add Comment

dan.mord@gmail.com (JIRA)

unread,

May 13, 2019, 8:32:05 AM5/13/19

to jenkinsc...@googlegroups.com

Dan Mordechai edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

@[~airadier] Thank you for this script! I will definitely try it as we are facing the same issue on our 2.107.2 Jenkins.

Some things to point out:

# I had to change the line
{code:java}
def dispatcher = channelSubscriber.this$0
{code}
to
{code:java}
def dispatcher = channelSubscriber.eventDispatcher
{code}

# Your script will only work for SSE gatway plugin version <= 1.15 as the SSEChannelSubscriber inner class in EventDispatcher.java will not have the eventDispatcher member any more.

Add Comment

dan.mord@gmail.com (JIRA)

unread,

May 13, 2019, 8:33:03 AM5/13/19

to jenkinsc...@googlegroups.com

Dan Mordechai edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

@[~airadier] Thank you for this script! I will definitely try it as we are facing the same issue on our 2.107.2 Jenkins.

Some things to point out:
# I had to change the line
{code:java}
def dispatcher = channelSubscriber.this$0
{code}
to
{code:java}
def dispatcher = channelSubscriber.eventDispatcher
{code}

# 2. Your script will only work for SSE gatway plugin version <= 1.15 as the SSEChannelSubscriber inner class in EventDispatcher.java will not have the eventDispatcher member any more.

Add Comment

dan.mord@gmail.com (JIRA)

unread,

May 13, 2019, 8:36:03 AM5/13/19

to jenkinsc...@googlegroups.com

Dan Mordechai edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

@ [~airadier] Thank you for this script! I will definitely try it as we are facing the same issue on our 2.107.2 Jenkins.

Some things to point out:
# I had to change the line
{code:java}
def dispatcher = channelSubscriber.this$0
{code}
to
{code:java}
def dispatcher = channelSubscriber.eventDispatcher
{code}

2. Your script will only work for SSE gatway plugin version <= 1.15 as the SSEChannelSubscriber inner class in EventDispatcher.java will not have the eventDispatcher member any more.

Add Comment

dan.mord@gmail.com (JIRA)

unread,

May 16, 2019, 5:43:03 AM5/16/19

to jenkinsc...@googlegroups.com

Dan Mordechai edited a comment on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

[~airadier] Thank you for this script! I will definitely try it as we are facing the same issue on our 2.107.2 Jenkins.

Some things to point out:
# I had to change the line
{code:java}
def dispatcher = channelSubscriber.this$0
{code}
to
{code:java}
def dispatcher = channelSubscriber.eventDispatcher
{code}

2. Your script will only work for SSE gatway plugin version <= > 1.15 as the SSEChannelSubscriber inner class in EventDispatcher.java will not have the eventDispatcher member any more.

Add Comment

raihaan.shouhell@autodesk.com (JIRA)

unread,

Jun 23, 2019, 11:13:02 PM6/23/19

to jenkinsc...@googlegroups.com

Raihaan Shouhell commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

A fix got merged to master that definitely helped our masters

Add Comment

olamy@apache.org (JIRA)

unread,

Jun 24, 2019, 10:56:04 PM6/24/19

to jenkinsc...@googlegroups.com

Olivier Lamy assigned an issue to Olivier Lamy

Jenkins /

JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

Change By:	Olivier Lamy
Assignee:	Olivier Lamy

Add Comment

olamy@apache.org (JIRA)

unread,

Jun 24, 2019, 10:57:04 PM6/24/19

to jenkinsc...@googlegroups.com

Olivier Lamy closed an issue as Fixed

fixed with https://github.com/jenkinsci/sse-gateway-plugin/commit/9cfe9d9c4d9e284ee7d099d4e734b3677ee677f9

Jenkins /

JENKINS-51057

EventDispatcher and ConcurrentLinkedQueue ate my JVM

Change By:	Olivier Lamy
Status:	Open Closed
Resolution:	Fixed

Add Comment

andreas.galek@dieboldnixdorf.com (JIRA)

unread,

Aug 12, 2019, 2:43:02 AM8/12/19

to jenkinsc...@googlegroups.com

Andreas Galek commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Having installed the newest plugin's version 1.19 I was facing error messages like this:

Aug 09, 2019 6:36:46 PM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (EventDispatcher.retryProcessor/37063) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:717)
    at java.util.Timer.<init>(Timer.java:160)
    at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296)
    at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437)
    at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299)
    at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)

Is it the same problem as the one you fixed in this Jira ticket?

Add Comment

olamy@apache.org (JIRA)

unread,

Aug 12, 2019, 3:11:09 AM8/12/19

to jenkinsc...@googlegroups.com

Olivier Lamy commented on

JENKINS-51057

Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM

Andreas Galek side effect bug of the fix please read here https://issues.jenkins-ci.org/browse/JENKINS-58684

Add Comment

Reply all

Reply to author

Forward