WebSocket memory leak 1.10.3 / 3.0.16

231 views
Skip to first unread message

ja...@inaseq.com

unread,
Dec 23, 2020, 2:18:00 PM12/23/20
to Quarkus Development mailing list
The following change in heap behaviour has occurred since upgrading from Quarkus 1.9.2 / Quarkus-HTTP 3.0.15 to 1.10.3 / 3.0.16.  The application in question is a WebSocket server pushing data to clients.  The application has not changed during the timeframe shown.  It has slow consumer protection via ping/pong monitoring.  Examining the G1GC behaviour I can see it trying Mixed collections but not reducing Old regions.  All of this leads me to suspect a memory leak.

Is there any chance https://github.com/quarkusio/quarkus-http/pull/44 may have caused this?

I'll get more heap detail to see what that tells us.

ja...@inaseq.com

unread,
Dec 23, 2020, 2:19:36 PM12/23/20
to Quarkus Development mailing list
heap.png

ja...@inaseq.com

unread,
Dec 23, 2020, 5:30:19 PM12/23/20
to Quarkus Development mailing list

Here are the retained objects and the root paths (somewhat redacted).  This suggests the issue is related to https://groups.google.com/g/quarkus-dev/c/_XyLtjEJEkQ/m/qcy7MbjdBQAJ as the SmallRyeManagedExecutor shown is the one that is being used in conjunction with the per client OrderedExecutor.
RootPath.png
MapObjects.png

ja...@inaseq.com

unread,
Dec 23, 2020, 5:46:55 PM12/23/20
to Quarkus Development mailing list
Will try per https://github.com/quarkusio/quarkus/pull/9873 now that it is complete.

Stuart Douglas

unread,
Dec 23, 2020, 6:28:35 PM12/23/20
to James Olsen, Quarkus Development mailing list
This is a leak in ViewExecutor, try this PR: https://github.com/quarkusio/quarkus/pull/14041

Stuart

--
You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/quarkus-dev/87a571cc-218f-4c3e-bb7d-7f98ccb9f8fbn%40googlegroups.com.

ja...@inaseq.com

unread,
Dec 23, 2020, 6:40:55 PM12/23/20
to Quarkus Development mailing list
Thanks, will give that a go.  Do you think I'd also get better mileage by removing the OrderedExecutor workaround?  I last looked at doing that in June with 1.5.1 but was still seeing some connections not being closed (https://groups.google.com/g/quarkus-dev/c/_XyLtjEJEkQ/m/qcy7MbjdBQAJ).

Stuart Douglas

unread,
Dec 23, 2020, 7:01:49 PM12/23/20
to James Olsen, Quarkus Development mailing list
You should not need that any more now that the underlying issue is resolved.

Stuart

ja...@inaseq.com

unread,
Jan 12, 2021, 5:40:39 PM1/12/21
to Quarkus Development mailing list
I can confirm that upgrading to JBoss Threads 3.2.0.Final resolves the memory issue.

I have retained the OrderedExecutor workaround as I'm still not getting reliable @OnClose callbacks without it.  Is this something we should be trying to resolve?

Stuart Douglas

unread,
Jan 13, 2021, 7:18:14 PM1/13/21
to James Olsen, Quarkus Development mailing list
@OnClose not working is definitely something we should look at. Do you have a way to reproduce it?

Stuart

James Olsen

unread,
Apr 1, 2021, 4:50:05 PM4/1/21
to Stuart Douglas, Quarkus Development mailing list
I don't have a reproducer yet.  I see that 1.13.0 has the new quarkus-websocket so I guess it makes sense to progress that version instead.  Not sure when I'll get time, but I'll check it out when I can.

You received this message because you are subscribed to a topic in the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/quarkus-dev/ocNbVSBT2aY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to quarkus-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/quarkus-dev/CAD%2BL2cxvq7eLyaVfEGtYDNU%3DHs1gZwrsTat2pNRarre32m-kVA%40mail.gmail.com.

James Olsen

unread,
Apr 28, 2022, 7:02:59 PM4/28/22
to James Olsen, Stuart Douglas, Quarkus Development mailing list
Stuart,

I recently upgraded to 2.8.2 which forced the switch from quarkus-undertow-websockets to quarkus-websockets so I took some time to look at this @OnClose issue again.  The issue does still exist but the underlying cause now seems apparent.

I did the upgrade and library switch, removed the OrderedExecutor workaround and set quarkus.websocket.dispatch-to-worker=true.

The @OnClose was then not called some of the time (I don't have a formal reproducer but our nightly test suite reliably triggers this).

The following appeared in our logs:

2022-04-28 02:10:19,368 ERROR [org.jboss.threads.errors] 'executor-thread-10' Thread Thread[executor-thread-10,5,main] threw an uncaught exception: java.lang.RuntimeException: java.lang.IllegalStateException: Instance already destroyed
at io.undertow.websockets.ServerWebSocketContainer.invokeEndpointMethod(ServerWebSocketContainer.java:534)
at io.undertow.websockets.ServerWebSocketContainer$6.run(ServerWebSocketContainer.java:514)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:553)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Instance already destroyed
at io.quarkus.arc.impl.AbstractInstanceHandle.get(AbstractInstanceHandle.java:41)
at io.quarkus.arc.runtime.BeanContainerImpl$1$1.get(BeanContainerImpl.java:40)
at io.quarkus.websockets.client.runtime.WebsocketCoreRecorder$1$1$1.getInstance(WebsocketCoreRecorder.java:136)
at io.undertow.websockets.annotated.AnnotatedEndpoint$5.run(AnnotatedEndpoint.java:225)
at io.undertow.websockets.ServerWebSocketContainer$1.call(ServerWebSocketContainer.java:143)
at io.undertow.websockets.ServerWebSocketContainer$1.call(ServerWebSocketContainer.java:140)
at io.quarkus.websockets.client.runtime.WebsocketCoreRecorder$4$1.call(WebsocketCoreRecorder.java:181)
at io.undertow.websockets.ServerWebSocketContainer.invokeEndpointMethod(ServerWebSocketContainer.java:532)
... 8 more

I think that was during a shutdown.  It doesn't appear for every missed @OnClose but executors often silently swallow these things, so it got me thinking.

Our @ServerEndpoint had the default scope so I added a @PreDestroy that effectively did the same cleanup work as the @OnClose if it hadn't already been called - problem solved.  I was also able to convert it to @ApplicationScoped and remove the @PreDestroy - problem also solved.

So it looks like a race condition between the ARC destroy of the Endpoint and the @OnClose callback.

We are proceeding with the @ApplicationScoped approach.  Anyone using the default scope and requiring some cleanup will need the @PreDestroy workaround until/unless you can resolve the underlying race condition.

Regards, James,

P.S. the memory leak referred to in the original subject line does not exist as the channel is closed even if the @OnClose isn't called.

Reply all
Reply to author
Forward
0 new messages