All threads are stuck when call jchannel.send

96 views
Skip to first unread message

Raju Panwar

unread,
Feb 8, 2024, 5:42:03 AM2/8/24
to jgroups-dev
Hi there,

We are running jgroups 4.2.22.

Following are the protocols stack.

<config
                xmlns="urn:org:jgroups"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
                <TCP
                                bind_addr="ip"
                                bind_port="7901"
                                max_bundle_size="264k"/>
                <TCPPING
                                initial_hosts="ip1[7901],ip2[7901]"
                                port_range="20"/>
                <FD_SOCK/>
                <pbcast.NAKACK2
                                use_mcast_xmit="false"
                                discard_delivered_msgs="true"/>
                <pbcast.STABLE
                                desired_avg_gossip="20000"
                                stability_delay="2000"/>
                <pbcast.GMS
                                join_timeout="5000"
                                print_local_addr="true"/>
                <FRAG2
                                frag_size="200000"/>
</config>


OR


<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="urn:org:jgroups"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.2.xsd">
    <TCP bind_port="7901"
         bind_addr="ip"
         max_bundle_size="256K"
         sock_conn_timeout="300"
         thread_pool.min_threads="0"
         thread_pool.max_threads="20"
         thread_pool.keep_alive_time="30000"/>
    <RED/>

    <TCPPING async_discovery="true"
             initial_hosts="ip1[7901],ip2[7901]"
             port_range="20"/>
    <MERGE3  min_interval="10000"
             max_interval="30000"/>
    <FD_SOCK/>
    <FD_ALL timeout="9000" interval="3000" />
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 use_mcast_xmit="false"
                   discard_delivered_msgs="true"/>
    <UNICAST3 />
    <pbcast.STABLE desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="200K"  />
    <!--RSVP resend_interval="2000" timeout="10000"/-->
    <pbcast.STATE_TRANSFER/>
</config>

We tried with 1 channel and 10 channels.

tried with both protocol stack.


The packet details are :Packet Length =69457 bytes, Header Length =78 bytes, Body Length =69367, type =100
Packet length is varying.


We are running with 400 worker threads and each thread is stuck with following dump.


[1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [13] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [14] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [15] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)


We never faced this issue with older version that is 3.x.

Has anyone else faced this type of issue or suggest what could have been done better.


Regards
Raju Panwar

Bela Ban

unread,
Feb 8, 2024, 10:08:41 AM2/8/24
to jgrou...@googlegroups.com
Can you post the entire stack trace? I need to see what the sender
thread which successfully acquires the send lock is stuck on. If it is
stuck on a TCP send, then I'd need the full stack dumps of all members.
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/3eadbca0-b4e2-46bd-9849-6689fb5524dan%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/3eadbca0-b4e2-46bd-9849-6689fb5524dan%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Bela Ban | http://www.jgroups.org

Raju Panwar

unread,
Feb 8, 2024, 11:32:52 AM2/8/24
to jgroups-dev
Hi Bela Ban,

Thanks for replying. At present, I have the below stackDump in which the worker threads looks are stuck. Let me know if these are helpful to reach to conclusion.

> > All threads suspended.
> > process reaper:
  [1] java.lang.UNIXProcess.waitForProcessExit (native method)
  [2] java.lang.UNIXProcess.lambda$initStreams$3 (UNIXProcess.java:289)
  [3] java.lang.UNIXProcess$$Lambda$114.939391749.run (null)
  [4] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,149)
  [5] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [6] java.lang.Thread.run (Thread.java:748)
Worker[0]:
  [1] com.xyz.abc.slee.threadmonitor.ThreadMonitorThread.registerThread (ThreadMonitorThread.java:275)
  [2] com.xyz.abc.slee.threadmonitor.ThreadMonitor.registerThread (ThreadMonitor.java:59)
  [3] com.xyz.ase.util.threadpool.WorkerThread.run (WorkerThread.java:59)
jgroups-54,dataChannel_5SAS_CLUSTER,tb6cas3-datachannel_5:
  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:215)
  [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill (SynchronousQueue.java:460)
  [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer (SynchronousQueue.java:362)
  [5] java.util.concurrent.SynchronousQueue.poll (SynchronousQueue.java:941)
  [6] java.util.concurrent.ThreadPoolExecutor.getTask (ThreadPoolExecutor.java:1,073)
  [7] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,134)
  [8] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [9] java.lang.Thread.run (Thread.java:748)
jgroups-54,dataChannel_2SAS_CLUSTER,tb6cas3-datachannel_2:
  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:215)
  [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill (SynchronousQueue.java:460)
  [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer (SynchronousQueue.java:362)
  [5] java.util.concurrent.SynchronousQueue.poll (SynchronousQueue.java:941)
  [6] java.util.concurrent.ThreadPoolExecutor.getTask (ThreadPoolExecutor.java:1,073)
  [7] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,134)
  [8] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [9] java.lang.Thread.run (Thread.java:748)
jgroups-58,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:
  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:215)
  [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill (SynchronousQueue.java:460)
  [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer (SynchronousQueue.java:362)
  [5] java.util.concurrent.SynchronousQueue.poll (SynchronousQueue.java:941)
  [6] java.util.concurrent.ThreadPoolExecutor.getTask (ThreadPoolExecutor.java:1,073)
  [7] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,134)
  [8] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [9] java.lang.Thread.run (Thread.java:748)
jgroups-58,dataChannel_8SAS_CLUSTER,tb6cas3-datachannel_8:
  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:215)
  [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill (SynchronousQueue.java:460)
  [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer (SynchronousQueue.java:362)
  [5] java.util.concurrent.SynchronousQueue.poll (SynchronousQueue.java:941)
  [6] java.util.concurrent.ThreadPoolExecutor.getTask (ThreadPoolExecutor.java:1,073)
  [7] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,134)
  [8] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [9] java.lang.Thread.run (Thread.java:748)
jgroups-57,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:567)
  [13] org.jgroups.protocols.pbcast.STABLE.sendStabilityMessage (STABLE.java:698)
  [14] org.jgroups.protocols.pbcast.STABLE.handleStableMessage (STABLE.java:561)
  [15] org.jgroups.protocols.pbcast.STABLE.sendStableMessage (STABLE.java:631)
  [16] org.jgroups.protocols.pbcast.STABLE$StableTask.run (STABLE.java:791)
  [17] org.jgroups.util.TimeScheduler3$Task.run (TimeScheduler3.java:324)
  [18] org.jgroups.util.TimeScheduler3$RecurringTask.run (TimeScheduler3.java:358)
  [19] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,149)
  [20] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [21] java.lang.Thread.run (Thread.java:748)
jgroups-56,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:567)
  [13] org.jgroups.protocols.pbcast.STABLE.sendStabilityMessage (STABLE.java:698)
  [14] org.jgroups.protocols.pbcast.STABLE.handleStableMessage (STABLE.java:561)
  [15] org.jgroups.protocols.pbcast.STABLE.sendStableMessage (STABLE.java:631)
  [16] org.jgroups.protocols.pbcast.STABLE.up (STABLE.java:288)
  [17] org.jgroups.protocols.pbcast.NAKACK2.deliverBatch (NAKACK2.java:953)
  [18] org.jgroups.protocols.pbcast.NAKACK2.removeAndDeliver (NAKACK2.java:887)
  [19] org.jgroups.protocols.pbcast.NAKACK2.handleMessages (NAKACK2.java:861)
  [20] org.jgroups.protocols.pbcast.NAKACK2.up (NAKACK2.java:688)
  [21] org.jgroups.protocols.FRAG2.up (FRAG2.java:196)
  [22] org.jgroups.stack.Protocol.up (Protocol.java:341)
  [23] org.jgroups.stack.Protocol.up (Protocol.java:341)
  [24] org.jgroups.protocols.TP.passBatchUp (TP.java:1,430)
  [25] org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.passBatchUp (MaxOneThreadPerSender.java:284)
  [26] org.jgroups.util.SubmitToThreadPool$BatchHandler.run (SubmitToThreadPool.java:147)
  [27] org.jgroups.util.MaxOneThreadPerSender$BatchHandlerLoop.run (MaxOneThreadPerSender.java:273)
  [28] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,149)
  [29] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [30] java.lang.Thread.run (Thread.java:748)
jgroups-55,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:567)
  [13] org.jgroups.protocols.pbcast.STABLE.sendStabilityMessage (STABLE.java:698)
  [14] org.jgroups.protocols.pbcast.STABLE.handleStableMessage (STABLE.java:561)
  [15] org.jgroups.protocols.pbcast.STABLE.handleUpEvent (STABLE.java:264)
  [16] org.jgroups.protocols.pbcast.STABLE.up (STABLE.java:257)
  [17] org.jgroups.protocols.pbcast.NAKACK2.up (NAKACK2.java:595)
  [18] org.jgroups.protocols.FRAG2.up (FRAG2.java:174)
  [19] org.jgroups.protocols.FD_SOCK.up (FD_SOCK.java:254)
  [20] org.jgroups.protocols.Discovery.up (Discovery.java:300)
  [21] org.jgroups.protocols.TP.passMessageUp (TP.java:1,404)
  [22] org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run (SubmitToThreadPool.java:98)
  [23] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,149)
  [24] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [25] java.lang.Thread.run (Thread.java:748)
jgroups-54,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:567)
  [13] org.jgroups.protocols.pbcast.STABLE.sendStabilityMessage (STABLE.java:698)
  [14] org.jgroups.protocols.pbcast.STABLE.handleStableMessage (STABLE.java:561)
  [15] org.jgroups.protocols.pbcast.STABLE.handleUpEvent (STABLE.java:264)
  [16] org.jgroups.protocols.pbcast.STABLE.up (STABLE.java:257)
  [17] org.jgroups.protocols.pbcast.NAKACK2.up (NAKACK2.java:595)
  [18] org.jgroups.protocols.FRAG2.up (FRAG2.java:174)
  [19] org.jgroups.protocols.FD_SOCK.up (FD_SOCK.java:254)
  [20] org.jgroups.protocols.Discovery.up (Discovery.java:300)
  [21] org.jgroups.protocols.TP.passMessageUp (TP.java:1,404)
  [22] org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run (SubmitToThreadPool.java:98)
  [23] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,149)
  [24] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
  [25] java.lang.Thread.run (Thread.java:748)
Worker[5]:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [13] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [14] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [15] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)
  [19] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:776)
  [20] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:790)
  [21] xyz.internalcode.channel.DataChannelPool.send (DataChannelPool.java:852)
  [22] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:785)
  [23] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:484)
  [24] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:460)
  [25] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:453)
  [26] xyz.internalcode.spi.replication.ReplicationContextImpl.handleReplicationEvent (ReplicationContextImpl.java:140)
  [27] xyz.internalcode.container.internalcodeIc.handleReplicationEvent (internalcodeIc.java:505)
  [28] xyz.internalcode.container.internalcodeProtocolSession.sendReplicationEvent (internalcodeProtocolSession.java:84)
  [29] xyz.internalcode.spi.replication.appDataRep.AppDataReplicator.doReplicate (AppDataReplicator.java:40)
  [30] com.xyz.ph.sip.SPH.doProvisionalResponse (SPH.java:1,401)
  [31] com.xyz.ph.common.PHS.doProvisionalResponse (PHS.java:784)
  [32] com.xyz.ph.common.PHS.doResponse (PHS.java:572)
  [33] javax.servlet.sip.SipServlet.service (null)
  [34] xyz.internalcode.container.internalcodeWrapper.invokeServlet (internalcodeWrapper.java:428)
  [35] xyz.internalcode.container.internalcodeWrapper.processMessage (internalcodeWrapper.java:163)
  [36] xyz.internalcode.container.internalcodeContext.processMessage (internalcodeContext.java:535)
  [37] xyz.internalcode.spi.container.AbstractProtocolSession.handleMessage (AbstractProtocolSession.java:476)
  [38] xyz.internalcode.container.internalcodeProtocolSession.handleResponse (internalcodeProtocolSession.java:269)
  [39] xyz.internalcode.sipconnector.internalcodeSipSession.sendResponseToServlet (internalcodeSipSession.java:2,047)
  [40] xyz.internalcode.sipconnector.internalcodeSipSession.handleResponse (internalcodeSipSession.java:2,231)
  [41] xyz.internalcode.container.internalcodeProtocolSession.handleMessage (internalcodeProtocolSession.java:214)
  [42] xyz.internalcode.container.internalcodeHost.processMessage (internalcodeHost.java:335)
  [43] xyz.internalcode.container.internalcodeEngine.processMessage (internalcodeEngine.java:137)
  [44] xyz.internalcode.container.internalcodeEngine.execute (internalcodeEngine.java:241)
  [45] xyz.internalcode.util.threadpool.WorkerThread.execute (WorkerThread.java:192)
  [46] xyz.internalcode.util.threadpool.WorkerThread.run (WorkerThread.java:111)
Worker[3]:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [13] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [14] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [15] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)
  [19] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:776)
  [20] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:790)
  [21] xyz.internalcode.channel.DataChannelPool.send (DataChannelPool.java:852)
  [22] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:785)
  [23] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:484)
  [24] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:460)
  [25] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:453)
  [26] xyz.internalcode.spi.replication.ReplicationContextImpl.cleanup (ReplicationContextImpl.java:75)
  [27] xyz.internalcode.container.internalcodeIc.cleanup (internalcodeIc.java:268)
  [28] xyz.internalcode.container.internalcodeIc.appSessionInvalidated (internalcodeIc.java:249)
  [29] xyz.internalcode.container.internalcodeApplicationSession.invalidate (internalcodeApplicationSession.java:1,074)
  [30] xyz.internalcode.container.sip.SIPAppSImpl.invalidate (SIPAppSImpl.java:397)
  [31] xyz.internalcode.container.sip.SIPAppSImpl.objectExpired (SIPAppSImpl.java:637)
  [32] xyz.internalcode.container.internalcodeApplicationSession.handleEvent (internalcodeApplicationSession.java:1,435)
  [33] xyz.internalcode.container.sip.SIPAppSImpl.handleEvent (SIPAppSImpl.java:533)
  [34] xyz.internalcode.container.internalcodeEngine.handleEvent (internalcodeEngine.java:154)
  [35] xyz.internalcode.container.internalcodeEngine.execute (internalcodeEngine.java:254)
  [36] xyz.internalcode.util.threadpool.WorkerThread.execute (WorkerThread.java:192)
  [37] xyz.internalcode.util.threadpool.WorkerThread.run (WorkerThread.java:111)
Worker[2]:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [13] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [14] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [15] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)
  [19] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:776)
  [20] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:790)
  [21] xyz.internalcode.channel.DataChannelPool.send (DataChannelPool.java:852)
  [22] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:785)
  [23] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:484)
  [24] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:460)
  [25] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:453)
  [26] xyz.internalcode.spi.replication.ReplicationContextImpl.handleReplicationEvent (ReplicationContextImpl.java:140)
  [27] xyz.internalcode.container.internalcodeIc.handleReplicationEvent (internalcodeIc.java:505)
  [28] xyz.internalcode.container.internalcodeProtocolSession.sendReplicationEvent (internalcodeProtocolSession.java:84)
  [29] xyz.internalcode.spi.replication.appDataRep.AppDataReplicator.doReplicate (AppDataReplicator.java:40)
  [30] com.xyz.ph.sip.SPH.doProvisionalResponse (SPH.java:1,401)
  [31] com.xyz.ph.common.PHS.doProvisionalResponse (PHS.java:784)
  [32] com.xyz.ph.common.PHS.doResponse (PHS.java:572)
  [33] javax.servlet.sip.SipServlet.service (null)
  [34] xyz.internalcode.container.internalcodeWrapper.invokeServlet (internalcodeWrapper.java:428)
  [35] xyz.internalcode.container.internalcodeWrapper.processMessage (internalcodeWrapper.java:163)
  [36] xyz.internalcode.container.internalcodeContext.processMessage (internalcodeContext.java:535)
  [37] xyz.internalcode.spi.container.AbstractProtocolSession.handleMessage (AbstractProtocolSession.java:476)
  [38] xyz.internalcode.container.internalcodeProtocolSession.handleResponse (internalcodeProtocolSession.java:269)
  [39] xyz.internalcode.sipconnector.internalcodeSipSession.sendResponseToServlet (internalcodeSipSession.java:2,047)
  [40] xyz.internalcode.sipconnector.internalcodeSipSession.handleResponse (internalcodeSipSession.java:2,231)
  [41] xyz.internalcode.container.internalcodeProtocolSession.handleMessage (internalcodeProtocolSession.java:214)
  [42] xyz.internalcode.container.internalcodeHost.processMessage (internalcodeHost.java:335)
  [43] xyz.internalcode.container.internalcodeEngine.processMessage (internalcodeEngine.java:137)
  [44] xyz.internalcode.container.internalcodeEngine.execute (internalcodeEngine.java:241)
  [45] xyz.internalcode.util.threadpool.WorkerThread.execute (WorkerThread.java:192)
  [46] xyz.internalcode.util.threadpool.WorkerThread.run (WorkerThread.java:111)
Worker[1]:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [13] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [14] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [15] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)
  [19] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:776)
  [20] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:790)
  [21] xyz.internalcode.channel.DataChannelPool.send (DataChannelPool.java:852)
  [22] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:785)
  [23] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:484)
  [24] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:460)
  [25] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:453)
  [26] xyz.internalcode.spi.replication.ReplicationContextImpl.handleReplicationEvent (ReplicationContextImpl.java:140)
  [27] xyz.internalcode.container.internalcodeIc.handleReplicationEvent (internalcodeIc.java:505)
  [28] xyz.internalcode.container.internalcodeProtocolSession.sendReplicationEvent (internalcodeProtocolSession.java:84)
  [29] xyz.internalcode.spi.replication.appDataRep.AppDataReplicator.doReplicate (AppDataReplicator.java:40)
  [30] com.xyz.ph.sip.SPH.doProvisionalResponse (SPH.java:1,401)
  [31] com.xyz.ph.common.PHS.doProvisionalResponse (PHS.java:784)
  [32] com.xyz.ph.common.PHS.doResponse (PHS.java:572)
  [33] javax.servlet.sip.SipServlet.service (null)
  [34] xyz.internalcode.container.internalcodeWrapper.invokeServlet (internalcodeWrapper.java:428)
  [35] xyz.internalcode.container.internalcodeWrapper.processMessage (internalcodeWrapper.java:163)
  [36] xyz.internalcode.container.internalcodeContext.processMessage (internalcodeContext.java:535)
  [37] xyz.internalcode.spi.container.AbstractProtocolSession.handleMessage (AbstractProtocolSession.java:476)
  [38] xyz.internalcode.container.internalcodeProtocolSession.handleResponse (internalcodeProtocolSession.java:269)
  [39] xyz.internalcode.sipconnector.internalcodeSipSession.sendResponseToServlet (internalcodeSipSession.java:2,047)
  [40] xyz.internalcode.sipconnector.internalcodeSipSession.handleResponse (internalcodeSipSession.java:2,231)
  [41] xyz.internalcode.container.internalcodeProtocolSession.handleMessage (internalcodeProtocolSession.java:214)
  [42] xyz.internalcode.container.internalcodeHost.processMessage (internalcodeHost.java:335)
  [43] xyz.internalcode.container.internalcodeEngine.processMessage (internalcodeEngine.java:137)
  [44] xyz.internalcode.container.internalcodeEngine.execute (internalcodeEngine.java:241)
  [45] xyz.internalcode.util.threadpool.WorkerThread.execute (WorkerThread.java:192)
  [46] xyz.internalcode.util.threadpool.WorkerThread.run (WorkerThread.java:111)
Worker[0]:

  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)
  [12] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [13] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [14] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [15] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)
  [19] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:776)
  [20] xyz.internalcode.channel.jgroups.JGroupsChannelProvider.send (JGroupsChannelProvider.java:790)
  [21] xyz.internalcode.channel.DataChannelPool.send (DataChannelPool.java:852)
  [22] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:785)
  [23] xyz.internalcode.replication.RepMgrImpl._replicate (RepMgrImpl.java:484)
  [24] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:460)
  [25] xyz.internalcode.replication.RepMgrImpl.replicate (RepMgrImpl.java:453)
  [26] xyz.internalcode.spi.replication.ReplicationContextImpl.handleReplicationEvent (ReplicationContextImpl.java:140)
  [27] xyz.internalcode.container.internalcodeIc.handleReplicationEvent (internalcodeIc.java:505)
  [28] xyz.internalcode.container.internalcodeProtocolSession.sendReplicationEvent (internalcodeProtocolSession.java:84)
  [29] xyz.internalcode.spi.replication.appDataRep.AppDataReplicator.doReplicate (AppDataReplicator.java:40)
  [30] com.xyz.ph.sip.SPH.doProvisionalResponse (SPH.java:1,401)
  [31] com.xyz.ph.common.PHS.doProvisionalResponse (PHS.java:784)
  [32] com.xyz.ph.common.PHS.doResponse (PHS.java:572)
  [33] javax.servlet.sip.SipServlet.service (null)
  [34] xyz.internalcode.container.internalcodeWrapper.invokeServlet (internalcodeWrapper.java:428)
  [35] xyz.internalcode.container.internalcodeWrapper.processMessage (internalcodeWrapper.java:163)
  [36] xyz.internalcode.container.internalcodeContext.processMessage (internalcodeContext.java:535)
  [37] xyz.internalcode.spi.container.AbstractProtocolSession.handleMessage (AbstractProtocolSession.java:476)
  [38] xyz.internalcode.container.internalcodeProtocolSession.handleResponse (internalcodeProtocolSession.java:269)
  [39] xyz.internalcode.sipconnector.internalcodeSipSession.sendResponseToServlet (internalcodeSipSession.java:2,047)
  [40] xyz.internalcode.sipconnector.internalcodeSipSession.handleResponse (internalcodeSipSession.java:2,231)
  [41] xyz.internalcode.container.internalcodeProtocolSession.handleMessage (internalcodeProtocolSession.java:214)
  [42] xyz.internalcode.container.internalcodeHost.processMessage (internalcodeHost.java:335)
  [43] xyz.internalcode.container.internalcodeEngine.processMessage (internalcodeEngine.java:137)
  [44] xyz.internalcode.container.internalcodeEngine.execute (internalcodeEngine.java:241)
  [45] xyz.internalcode.util.threadpool.WorkerThread.execute (WorkerThread.java:192)
  [46] xyz.internalcode.util.threadpool.WorkerThread.run (WorkerThread.java:111)


Thanks,
Raju

Bela Ban

unread,
Feb 8, 2024, 12:17:39 PM2/8/24
to jgrou...@googlegroups.com
Is the stack dump complete? I'm asking because I don't see a thread
stating with "TQ-Bundler".
When you have a freshly started system, do you see this thread in the
stack dump?

The TransferQueue (TQ) bundler thread is tasked with sending messages
that all other threads add to the queue. When the queue is full, threads
block trying to add their messages, and that's what you see in a couple
of sender threads below (ArrayBlockingQueue.put). If no TQ-bundler is
removing messages, then that's what'll happen.

Can you reproduce this behavior? Is yes, can you run this with the
latest JGroups 5.3.2?

On 08.02.2024 17:32, 'Raju Panwar' via jgroups-dev wrote:
> Hi Bela Ban,
>
> Thanks for replying. At present, I have the below stackDump in which
> the worker threads looks are stuck. Let me know if these are helpful
> to reach to conclusion.
>
> /> > All threads suspended.
> j*groups-54,dataChannel_2SAS_CLUSTER,tb6cas3-datachannel_2:*
>   [1] sun.misc.Unsafe.park (native method)
>   [2] java.util.concurrent.locks.LockSupport.parkNanos
> (LockSupport.java:215)
>   [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill
> (SynchronousQueue.java:460)
>   [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer
> (SynchronousQueue.java:362)
>   [5] java.util.concurrent.SynchronousQueue.poll
> (SynchronousQueue.java:941)
>   [6] java.util.concurrent.ThreadPoolExecutor.getTask
> (ThreadPoolExecutor.java:1,073)
>   [7] java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1,134)
>   [8] java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:624)
>   [9] java.lang.Thread.run (Thread.java:748)
> *jgroups-58,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:*
>   [1] sun.misc.Unsafe.park (native method)
>   [2] java.util.concurrent.locks.LockSupport.parkNanos
> (LockSupport.java:215)
>   [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill
> (SynchronousQueue.java:460)
>   [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer
> (SynchronousQueue.java:362)
>   [5] java.util.concurrent.SynchronousQueue.poll
> (SynchronousQueue.java:941)
>   [6] java.util.concurrent.ThreadPoolExecutor.getTask
> (ThreadPoolExecutor.java:1,073)
>   [7] java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1,134)
>   [8] java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:624)
>   [9] java.lang.Thread.run (Thread.java:748)
> *jgroups-58,dataChannel_8SAS_CLUSTER,tb6cas3-datachannel_8:*
>   [1] sun.misc.Unsafe.park (native method)
>   [2] java.util.concurrent.locks.LockSupport.parkNanos
> (LockSupport.java:215)
>   [3] java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill
> (SynchronousQueue.java:460)
>   [4] java.util.concurrent.SynchronousQueue$TransferStack.transfer
> (SynchronousQueue.java:362)
>   [5] java.util.concurrent.SynchronousQueue.poll
> (SynchronousQueue.java:941)
>   [6] java.util.concurrent.ThreadPoolExecutor.getTask
> (ThreadPoolExecutor.java:1,073)
>   [7] java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1,134)
>   [8] java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:624)
>   [9] java.lang.Thread.run (Thread.java:748)
> *jgroups-57,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:*
> *jgroups-56,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:*
> *jgroups-55,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:*
> *jgroups-54,dataChannel_6SAS_CLUSTER,tb6cas3-datachannel_6:*
> (WorkerThread.java:111)/
> https://groups.google.com/d/msgid/jgroups-dev/0fb7ad46-41f3-4901-b3a4-9faebf70bc4dn%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/0fb7ad46-41f3-4901-b3a4-9faebf70bc4dn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Raju Panwar

unread,
Feb 20, 2024, 1:23:41 AM2/20/24
to jgroups-dev
Hi Bela,
Thanks for your reply . but we need your further support in resolving this issue as its blocker for us and causing load failure.

Why we are not seeing TransferQueue (TQ) bundler thread, is there any configuration to be done?
We have not seen any such issue with jgroups 3.x. with high load, it works perfectly fine. 
But we are facing this issue with 4.2.22. 
we cannot move to 5.x currently. please help in resolving the issue.

Thanks
Raju

reeta

unread,
Mar 11, 2024, 8:44:48 AM3/11/24
to jgroups-dev
Hi Bela ban,

Can you please help us our all the threads get stuck on sending message on jgroups TransferQueueBundler . 
is there any configuration issue.

Worker[3]:
  [1] sun.misc.Unsafe.park (native method)
  [2] java.util.concurrent.locks.LockSupport.park (LockSupport.java:175)
  [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:2,039)
  [4] java.util.concurrent.ArrayBlockingQueue.put (ArrayBlockingQueue.java:353)
  [5] org.jgroups.protocols.TransferQueueBundler.send (TransferQueueBundler.java:101)
  [6] org.jgroups.protocols.TP.send (TP.java:1,620)
  [7] org.jgroups.protocols.TP._send (TP.java:1,353)
  [8] org.jgroups.protocols.TP.down (TP.java:1,262)
  [9] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [10] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [11] org.jgroups.protocols.pbcast.NAKACK2.send (NAKACK2.java:787)
  [12] org.jgroups.protocols.pbcast.NAKACK2.down (NAKACK2.java:568)
  [13] org.jgroups.protocols.pbcast.STABLE.down (STABLE.java:298)
  [14] org.jgroups.stack.Protocol.down (Protocol.java:287)
  [15] org.jgroups.protocols.FRAG2.down (FRAG2.java:148)

  [16] org.jgroups.stack.ProtocolStack.down (ProtocolStack.java:927)
  [17] org.jgroups.JChannel.down (JChannel.java:645)
  [18] org.jgroups.JChannel.send (JChannel.java:484)

Thanks
Reeta

Bela Ban

unread,
Mar 11, 2024, 11:36:41 AM3/11/24
to jgrou...@googlegroups.com
The sender threads are blocked on a full queue. I suspect the thread
which holds the lock is blocked on TCP write. The receivers might be
slow (or stuck) in processing the messages. This requires stack dumps
from all nodes.

Alternatively, you could use [1] (bundler.drop_when_full="true"), but
this would not reveal the root cause.

[1] https://issues.redhat.com/browse/JGRP-2765
> https://groups.google.com/d/msgid/jgroups-dev/7afb85b2-adb5-4d69-a5ff-0f733edef3c1n%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/7afb85b2-adb5-4d69-a5ff-0f733edef3c1n%40googlegroups.com?utm_medium=email&utm_source=footer>.

reeta

unread,
Mar 12, 2024, 8:40:25 AM3/12/24
to jgroups-dev
Hi bela,
Thanks for your reply.

can we enable thread pool in jgroups so that it donot use our application threads  and donot block them?

Bela Ban

unread,
Mar 12, 2024, 8:41:50 AM3/12/24
to jgrou...@googlegroups.com

reeta

unread,
Mar 12, 2024, 9:08:07 AM3/12/24
to jgroups-dev

Hi Bela,
Thank you for your reply.

we are badly stuck . we are using jgroups version 4.2.22 . we are not able to run  call load as per customer requirement  we do data replication for each call from active node to standby node to recover call after node switchover to write CDRs and handle the call hang by user gracefully.

Needed your help in finding root cause of issue. i will be sharing thread dump from both nodes. can you also please let me know if below config has any major significance . currently its not enabled. but in initial hosts we have given ipv4 ips only. if data size sent over jgroups channel can also cause issue? 

reeta

unread,
Mar 12, 2024, 9:18:37 AM3/12/24
to jgroups-dev
please find the dump files from both nodes 
jgroup-channel-threaddump.7z

Bela Ban

unread,
Mar 13, 2024, 3:33:15 AM3/13/24
to jgrou...@googlegroups.com
Hi Reeta

ok, so here's what I assume happened:

* The TransferQueueBundler has a blocking queue to which senders add
their messages; and from which a single bundler thread continually
dequeues messages and sends them.
* The TQB's bundler thread is blocked on a TCP-write as the TCP
send-window is full (this means that the bundler won't dequeue more
messages from the queue, and it becomes full)
* This happens because the peer's Connection.Receiver is blocked: the
reason is that it processed a received message, but then tried to send a
message on the same thread. This blocked because the TQB's queue was
full, and the send blocks until there's space.

This is kind of like a distributed deadlock. There are a number of
solutions described in [1]. Solution #2 is only available in 5.x. #3 and
#4 have not yet been implemented, leaving you with #1 (RED) in 4.2.22.

RED [2] starts dropping messages when the TQB's queue becomes full. This
is not an issue as messages will be retransmitted. This will allow the
stack dump shown in [1] (from your email) to be resolved, and
Connection.Receiver will  continue to receive messages, avoiding the
full TCP send-window on the sender side. Place RED above the transport:

...
<RED/>
<UDP.../>

Let me know if this works. If not, we need to investigate further and/or
backport #2.
Cheers


[1] https://issues.redhat.com/browse/JGRP-2724
[2] http://www.jgroups.org/manual5/index.html#_random_early_drop_red

On 12.03.2024 14:18, reeta wrote:
>
>
> On Tuesday, March 12, 2024 at 6:38:07 PM UTC+5:30 reeta wrote:
>
>
> Hi Bela,
> Thank you for your reply.
>
> we are badly stuck . we are using jgroups version 4.2.22 . we are
> not able to run  call load as per customer requirement  we do data
> replication for each call from active node to standby node to
> recover call after node switchover to write CDRs and handle the
> call hang by user gracefully.
>
> Needed your help in finding root cause of issue. i will be sharing
> thread dump from both nodes. can you also please let me know if
> below config has any major significance . currently its not
> enabled. but in initial hosts we have given ipv4 ips only. if data
> size sent over jgroups channel can also cause issue?
>
> -Djava.net.preferIPv4Stack=true
> On Tuesday, March 12, 2024 at 6:11:50 PM UTC+5:30 Bela Ban wrote:
>
> No. But you can process the receive(Message) or
> receive(MessageBatch) in
> your own application thread --
>
Bela Ban | http://www.jgroups.org

reeta

unread,
Mar 13, 2024, 6:05:08 AM3/13/24
to jgroups-dev
Hi Bela,

Thanks for your response.
we tried with RED, but it does not help we are still facing same issue.
Please suggest what we can do further.

Thanks
Reeta

Bela Ban

unread,
Mar 13, 2024, 8:43:36 AM3/13/24
to jgrou...@googlegroups.com
Try with RED.max_threshold="0.8" min_threshold="0.3"

I'll backport https://issues.redhat.com/browse/JGRP-2765 this afternoon,
and will send you the modified JAR to test
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/91a58ca1-097d-49c4-b547-cc737996f5e8n%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/91a58ca1-097d-49c4-b547-cc737996f5e8n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Bela Ban

unread,
Mar 13, 2024, 12:00:24 PM3/13/24
to jgrou...@googlegroups.com
I backported JGRP-2765, and released 4.2.27.Final.

To try it out, use <TCP drop_when_full='"true"... />

Let me know whether this fixed the issue.
Cheers

bel...@gmail.com

unread,
Mar 14, 2024, 5:20:41 AM3/14/24
to jgroups-dev
Another thing you could try out is to set `weight_factor="1"` in `RED`. This makes RED start stop messages sooner.

reeta

unread,
Mar 15, 2024, 2:38:09 AM3/15/24
to jgroups-dev
Hi Bela ,

Thanks for your response. we have identified issue as packet size that we are sending is big, only then issue is coming.

we send 5 messages per call at 125 calls per secs (cps). and these total 5 messages make 205kb data transfer per call. we tried to do a POC
and reduce size to 70k i.e. one third. The load works fine for us. Is there any way in jgroups to send large data and is there any  limitation to send size of data ?

for other application we have per call 50k data transfer and at 500cps it works fine.

Please suggest.

Thanks
Reeta

belaban

unread,
Mar 15, 2024, 4:24:44 AM3/15/24
to reeta, jgroups-dev
The message size should not be a problem, as fragmentation takes care of this.
Have you tried my other suggestions?
Cheers



Von meinem/meiner Galaxy gesendet


-------- Ursprüngliche Nachricht --------
Von: reeta <ree...@gmail.com>
Datum: 15.03.24 07:38 (GMT+01:00)
An: jgroups-dev <jgrou...@googlegroups.com>
Betreff: Re: All threads are stuck when call jchannel.send

reeta

unread,
Mar 15, 2024, 7:10:23 AM3/15/24
to jgroups-dev
hi Bela,

Thanks for your response, how we can get release 4.2.27.Final.?
Reeta

Bela Ban

unread,
Mar 15, 2024, 7:17:29 AM3/15/24
to jgrou...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages