UNICAST3.down, docker

26 views
Skip to first unread message

Miha Zoubek

unread,
Dec 22, 2024, 10:08:27 AM12/22/24
to jgroups-dev
Hi

I am running our application in a Docker container using host network mode. The application utilizes JGroups for caching. It has been running without issues for several months, but now it requires a restart approximately every 14 days due to a thread exhaustion issue related to JGroups.


I need assistance with this issue as I'm unsure how to resolve it. We upgraded the library version, but that didn't help.


stack trace:


jgroups-3659,plyp-be3-53137" #26962 prio=5 os_prio=0 cpu=927.60ms elapsed=32.13s tid=0x000074047c108800 nid=0x55acc runnable  [0x00007405a210c000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(java...@11.0.25/Native Method)
        at java.net.SocketOutputStream.socketWrite(java...@11.0.25/Unknown Source)
        at java.net.SocketOutputStream.write(java...@11.0.25/Unknown Source)
        at java.io.BufferedOutputStream.flushBuffer(java...@11.0.25/Unknown Source)
        at java.io.BufferedOutputStream.write(java...@11.0.25/Unknown Source)
        - locked <0x00000004e38f3bd0> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.write(java...@11.0.25/Unknown Source)
        - locked <0x00000004e38f3ba8> (a java.io.DataOutputStream)
        at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:166)
        at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:136)
        at org.jgroups.blocks.cs.BaseServer.send(BaseServer.java:209)
        at org.jgroups.protocols.TCP.send(TCP.java:91)
        at org.jgroups.protocols.BasicTCP.sendUnicast(BasicTCP.java:146)
        at org.jgroups.protocols.TP.sendToSingleMember(TP.java:1650)
        at org.jgroups.protocols.TP.doSend(TP.java:1638)
        at org.jgroups.protocols.NoBundler.sendSingleMessage(NoBundler.java:38)
        at org.jgroups.protocols.NoBundler.send(NoBundler.java:30)
        at org.jgroups.protocols.TP.send(TP.java:1626)
        at org.jgroups.protocols.TP._send(TP.java:1359)
        at org.jgroups.protocols.TP.down(TP.java:1268)
        at org.jgroups.stack.Protocol.down(Protocol.java:287)
        at org.jgroups.stack.Protocol.down(Protocol.java:287)
        at org.jgroups.stack.Protocol.down(Protocol.java:287)
        at org.jgroups.protocols.FailureDetection.down(FailureDetection.java:171)
        at org.jgroups.stack.Protocol.down(Protocol.java:287)
        at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:567)
        at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:656)
        at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:298)
        at org.jgroups.stack.Protocol.down(Protocol.java:287)
        at org.jgroups.protocols.UFC_NB.lambda$new$0(UFC_NB.java:28)
        at org.jgroups.protocols.UFC_NB$$Lambda$744/0x0000000840c51c40.accept(Unknown Source)
        at java.util.ArrayList.forEach(java...@11.0.25/Unknown Source)
        at org.jgroups.util.NonBlockingCredit.increment(NonBlockingCredit.java:90)
        at org.jgroups.protocols.UFC.handleCredit(UFC.java:163)
        at org.jgroups.protocols.FlowControl.handleUpEvent(FlowControl.java:380)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:358)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:876)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:254)
        at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1055)
        at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:778)
        at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:759)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:412)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:598)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
        at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:186)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:254)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:281)
        at org.jgroups.protocols.Discovery.up(Discovery.java:300)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1410)
        at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:98)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java...@11.0.25/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java...@11.0.25/Unknown Source)

   at java.util.concurrent.ThreadPoolExecutor.runWorker(java...@11.0.25/Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java...@11.0.25/Unknown Source)
        at java.lang.Thread.run(java...@11.0.25/Unknown Source)

Bela Ban

unread,
Dec 22, 2024, 10:10:45 AM12/22/24
to jgrou...@googlegroups.com
The info you post is meager; no config, version of JGroups, a reproducer etc...
Note that no-bundler is not recommended to handle a lot of traffic, use transfer-queue-bundler instead.
--
You received this message because you are subscribed to the Google Groups "jgroups-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/48701c61-0f0f-4b7a-a8bf-fe8969970ef9n%40googlegroups.com.

-- 
Bela Ban | http://www.jgroups.org

Miha Zoubek

unread,
Jan 5, 2025, 11:37:23 AMJan 5
to Bela Ban, jgrou...@googlegroups.com
Hi Bela

Apology for the late reply.

Version: 4.2.30.Final
If it can be reproduced or if I know how to reproduce it: No (it happens randomly)

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.1.xsd">
    <TCP bind_addr="${jgroups.bind_addr,jgroups.tcp.address:SITE_LOCAL}"
         bind_port="${jgroups.bind_port,jgroups.tcp.port:7800}"
         enable_diagnostics="false"
         thread_naming_pattern="pl"
         send_buf_size="640k"
         sock_conn_timeout="300"
         bundler_type="no-bundler"
         logical_addr_cache_expiration="360000"

         thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"
         thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
         thread_pool.keep_alive_time="60000"
    />

    <TCPPING initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}"
             port_range="3"
             ergonomics="false"
    />

    <MERGE3 min_interval="10000"
            max_interval="30000"
    />
    <FD_SOCK/>
    <FD_ALL timeout="${jgroups.timeoutMs:60000}"
            interval="15000"
            timeout_check_interval="5000"
    />
    <VERIFY_SUSPECT timeout="5000"/>
    <pbcast.NAKACK2 use_mcast_xmit="false"
                    xmit_interval="100"
                    xmit_table_num_rows="50"
                    xmit_table_msgs_per_row="1024"
                    xmit_table_max_compaction_time="30000"
                    resend_last_seqno="true"
    />
    <UNICAST3 xmit_interval="100"
              xmit_table_num_rows="50"
              xmit_table_msgs_per_row="1024"
              xmit_table_max_compaction_time="30000"
              conn_expiry_timeout="0"
    />
    <pbcast.STABLE stability_delay="500"
                   desired_avg_gossip="5000"
                   max_bytes="1M"
    />
    <pbcast.GMS print_local_addr="false"
                join_timeout="${jgroups.join_timeout:15000}"
    />
    <UFC_NB max_credits="3m"
            min_threshold="0.40"
    />
    <MFC_NB max_credits="3m"
            min_threshold="0.40"
    />
    <FRAG3/>
</config>




You received this message because you are subscribed to a topic in the Google Groups "jgroups-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jgroups-dev/wcv27qj9pq8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jgroups-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/4e8ecfd4-5c2a-4114-b121-dbc4051d97e2%40gmail.com.


--
LP, Miha

Bela Ban

unread,
Jan 6, 2025, 12:30:22 AMJan 6
to jgrou...@googlegroups.com
The stack trace looks normal; the sender thread is in RUNNABLE state. Have you tried transfer-queue-bundler? Also try replace UFC_NB/MFC-NB with their blocking counterparts (UFC, MFC).
When the thread exhaustion occurs, JGroups should log it: this dump would be interesting to have...

Miha Zoubek

unread,
Jan 7, 2025, 3:01:54 PMJan 7
to Bela Ban, jgrou...@googlegroups.com
Thank you!

The issue with this runnable thread is that multiple threads were initiated at 20-second intervals. If you compare the first thread dump with the last one, which was taken about 5 minutes later, you'll notice that this thread (id of the thread) is still active which I would say is not normal. And there are multiple threads like this when this occurs..




--
LP, Miha

Bela Ban

unread,
Jan 8, 2025, 4:04:31 AMJan 8
to Miha Zoubek, jgrou...@googlegroups.com
I only see a *single* thread dump.

What you see might be normal, as the worker thread grabs message(s) and passes them up continually, so the thread ID is the same. Only if the stack trace is identical (in UFC.handleCredit()), then we might have a problem.

OTOH, the thread might be stuck in the TCP write: if the send-window is 0, then the writer is blocked on the write (even though the state is RUNNABLE). This would mean that a receiver thread is stukc somewhere delivering messages and cannot receive messages from TCP.

Have you changed to transfer-queue bundler? This will reduce load on the thread pool, as message batches rather than single messages are sent.
Reply all
Reply to author
Forward
0 new messages