UNICAST3.down, docker

Miha Zoubek

unread,

Dec 22, 2024, 10:08:27 AM12/22/24

to jgroups-dev

Hi

I am running our application in a Docker container using host network mode. The application utilizes JGroups for caching. It has been running without issues for several months, but now it requires a restart approximately every 14 days due to a thread exhaustion issue related to JGroups.

I need assistance with this issue as I'm unsure how to resolve it. We upgraded the library version, but that didn't help.

stack trace:

jgroups-3659,plyp-be3-53137" #26962 prio=5 os_prio=0 cpu=927.60ms elapsed=32.13s tid=0x000074047c108800 nid=0x55acc runnable [0x00007405a210c000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(java...@11.0.25/Native Method)
at java.net.SocketOutputStream.socketWrite(java...@11.0.25/Unknown Source)
at java.net.SocketOutputStream.write(java...@11.0.25/Unknown Source)
at java.io.BufferedOutputStream.flushBuffer(java...@11.0.25/Unknown Source)
at java.io.BufferedOutputStream.write(java...@11.0.25/Unknown Source)
- locked <0x00000004e38f3bd0> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(java...@11.0.25/Unknown Source)
- locked <0x00000004e38f3ba8> (a java.io.DataOutputStream)
at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:166)
at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:136)
at org.jgroups.blocks.cs.BaseServer.send(BaseServer.java:209)
at org.jgroups.protocols.TCP.send(TCP.java:91)
at org.jgroups.protocols.BasicTCP.sendUnicast(BasicTCP.java:146)
at org.jgroups.protocols.TP.sendToSingleMember(TP.java:1650)
at org.jgroups.protocols.TP.doSend(TP.java:1638)
at org.jgroups.protocols.NoBundler.sendSingleMessage(NoBundler.java:38)
at org.jgroups.protocols.NoBundler.send(NoBundler.java:30)
at org.jgroups.protocols.TP.send(TP.java:1626)
at org.jgroups.protocols.TP._send(TP.java:1359)
at org.jgroups.protocols.TP.down(TP.java:1268)
at org.jgroups.stack.Protocol.down(Protocol.java:287)
at org.jgroups.stack.Protocol.down(Protocol.java:287)
at org.jgroups.stack.Protocol.down(Protocol.java:287)
at org.jgroups.protocols.FailureDetection.down(FailureDetection.java:171)
at org.jgroups.stack.Protocol.down(Protocol.java:287)
at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:567)
at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:656)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:298)
at org.jgroups.stack.Protocol.down(Protocol.java:287)
at org.jgroups.protocols.UFC_NB.lambda$new$0(UFC_NB.java:28)
at org.jgroups.protocols.UFC_NB$$Lambda$744/0x0000000840c51c40.accept(Unknown Source)
at java.util.ArrayList.forEach(java...@11.0.25/Unknown Source)
at org.jgroups.util.NonBlockingCredit.increment(NonBlockingCredit.java:90)
at org.jgroups.protocols.UFC.handleCredit(UFC.java:163)
at org.jgroups.protocols.FlowControl.handleUpEvent(FlowControl.java:380)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:358)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:876)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:254)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1055)
at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:778)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:759)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:412)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:598)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:186)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:254)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:281)
at org.jgroups.protocols.Discovery.up(Discovery.java:300)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1410)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:98)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java...@11.0.25/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java...@11.0.25/Unknown Source)

at java.util.concurrent.ThreadPoolExecutor.runWorker(java...@11.0.25/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java...@11.0.25/Unknown Source)
at java.lang.Thread.run(java...@11.0.25/Unknown Source)

Bela Ban

unread,

Dec 22, 2024, 10:10:45 AM12/22/24

to jgrou...@googlegroups.com

The info you post is meager; no config, version of JGroups, a reproducer etc...
Note that no-bundler is not recommended to handle a lot of traffic, use transfer-queue-bundler instead.

--
You received this message because you are subscribed to the Google Groups "jgroups-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/48701c61-0f0f-4b7a-a8bf-fe8969970ef9n%40googlegroups.com.

-- 
Bela Ban | http://www.jgroups.org

Miha Zoubek

unread,

Jan 5, 2025, 11:37:23 AMJan 5

to Bela Ban, jgrou...@googlegroups.com

Hi Bela

Apology for the late reply.

Version: 4.2.30.Final

If it can be reproduced or if I know how to reproduce it: No (it happens randomly)

</config>

You received this message because you are subscribed to a topic in the Google Groups "jgroups-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jgroups-dev/wcv27qj9pq8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jgroups-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/4e8ecfd4-5c2a-4114-b121-dbc4051d97e2%40gmail.com.

--

LP, Miha

Bela Ban

unread,

Jan 6, 2025, 12:30:22 AMJan 6

to jgrou...@googlegroups.com

The stack trace looks normal; the sender thread is in RUNNABLE state. Have you tried transfer-queue-bundler? Also try replace UFC_NB/MFC-NB with their blocking counterparts (UFC, MFC).
When the thread exhaustion occurs, JGroups should log it: this dump would be interesting to have...

To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/CAFWwwRj69rKtLXTU2mNuG3j-LTwMPp_KBF2GTLh%3DBpkd96Kz3A%40mail.gmail.com.

Miha Zoubek

unread,

Jan 7, 2025, 3:01:54 PMJan 7

to Bela Ban, jgrou...@googlegroups.com

Thank you!

The issue with this runnable thread is that multiple threads were initiated at 20-second intervals. If you compare the first thread dump with the last one, which was taken about 5 minutes later, you'll notice that this thread (id of the thread) is still active which I would say is not normal. And there are multiple threads like this when this occurs..

To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/30531872-3e4a-46ba-844a-77768f3e952a%40mailbox.org.

--

LP, Miha

Bela Ban

unread,

Jan 8, 2025, 4:04:31 AMJan 8

to Miha Zoubek, jgrou...@googlegroups.com

I only see a *single* thread dump.

What you see might be normal, as the worker thread grabs message(s) and passes them up continually, so the thread ID is the same. Only if the stack trace is identical (in UFC.handleCredit()), then we might have a problem.

OTOH, the thread might be stuck in the TCP write: if the send-window is 0, then the writer is blocked on the write (even though the state is RUNNABLE). This would mean that a receiver thread is stukc somewhere delivering messages and cannot receive messages from TCP.

Have you changed to transfer-queue bundler? This will reduce load on the thread pool, as message batches rather than single messages are sent.

Reply all

Reply to author

Forward