thread pool is full

300 views
Skip to first unread message

kumar bava

unread,
Jan 27, 2023, 11:23:19 AM1/27/23
to jgroups-dev
Hi Bela !

Randomly in our cluster(s) we see below log, followed by a thread dump:
FATAL Log4J2LogImpl.java:126: xx: thread pool is full (max=200, active=200); thread dump (dumped once, until thread_dump is reset):
==========thread dump================
"Reference Handler" #2 prio=0 tid=0x2 nid=NA runnable
   java.lang.Thread.State: RUNNABLE
        at java...@11.0.15/java.lang.ref.Reference.waitForReferencePendingList(Native Method)
        at java...@11.0.15/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
        at java...@11.0.15/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)


"Finalizer" #3 prio=0 tid=0x3 nid=NA waiting
   java.lang.Thread.State: WAITING
        at java...@11.0.15/java.lang.Object.wait(Native Method)
        - waiting on <0x5682507c> (a java.lang.ref.ReferenceQueue$Lock)
        at java...@11.0.15/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
        at java...@11.0.15/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:176)
        at java...@11.0.15/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:170)


"Signal Dispatcher" #4 prio=0 tid=0x4 nid=NA runnable
   java.lang.Thread.State: RUNNABLE


"Common-Cleaner" #10 prio=0 tid=0xa nid=NA timed_waiting
   java.lang.Thread.State: TIMED_WAITING
        at java...@11.0.15/java.lang.Object.wait(Native Method)
        - waiting on <0x25ae7778> (a java.lang.ref.ReferenceQueue$Lock)
        at java...@11.0.15/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
        at java...@11.0.15/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)
        at java...@11.0.15/java.lang.Thread.run(Thread.java:829)
        at java...@11.0.15/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:161)


"RxSchedulerPurge-1" #16 prio=0 tid=0x10 nid=NA timed_waiting
   java.lang.Thread.State: TIMED_WAITING
        at java...@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
        - parking to wait for <0x13724708> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java...@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
        at java...@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
        at java...@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
        at java...@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
        at java...@11.0.15/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
        at java...@11.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
        at java...@11.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java...@11.0.15/java.lang.Thread.run(Thread.java:829)


"RxCachedWorkerPoolEvictor-1" #17 prio=0 tid=0x11 nid=NA timed_waiting
   java.lang.Thread.State: TIMED_WAITING
        at java...@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
        - parking to wait for <0x42f62aca> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java...@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
        at java...@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
        at java...@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
        at java...@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
        at java...@11.0.15/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
        at java...@11.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
        at java...@11.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java...@11.0.15/java.lang.Thread.run(Thread.java:829)


"Timer runner-2,mss-web,stg-snd-1-t14-37499" #19 prio=0 tid=0x13 nid=NA timed_waiting
   java.lang.Thread.State: TIMED_WAITING
        at java...@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
        - parking to wait for <0x6b39358> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java...@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
        at java...@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
        at java...@11.0.15/java.util.concurrent.DelayQueue.take(DelayQueue.java:229)
        at java...@11.0.15/java.util.concurrent.DelayQueue.take(DelayQueue.java:77)
        at app//org.jgroups.util.TimeScheduler3.run(TimeScheduler3.java:194)
==========thread dump================

When its seen member fails to do its operation(e.g., infinspan locking. We are using jgroup as part of infinispan-13.0.5).
Looking at the logs, its happening when majority of the members leaving the cluster.
Our jgroup config is :
========jgroup config=======
<config xmlns="urn:org:jgroups"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"
        >
    <UDP
         bind_addr="match-interface:{{ mssweb_bind_addr }}"
         mcast_port="${jgroups.udp.mcast_port:{{ mssweb_mcast_port }}}"
         mcast_addr="{{ mssweb_mcast_group_addr }}"
         enable_diagnostics="false"
         thread_pool.min_threads="0"
         thread_pool.max_threads="200"
         thread_pool.keep_alive_time="30000"/>

    <PING num_discovery_runs="5"/>
    <MERGE3 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD_ALL3/>
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 xmit_interval="500"/>
    <UNICAST3 xmit_interval="500" />
    <pbcast.STABLE desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="5000"/>
    <UFC max_credits="10M"
         min_threshold="0.4"/>
    <MFC max_credits="10M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <pbcast.STATE_TRANSFER />
    <CENTRAL_LOCK />
</config>
========jgroup config=======

Number of members in the cluster : 8

We would need help to understand the probable root-cause for it and anyway to avoid this problem by optimizing the configuration(Before we just go ahead to increase the max_threads value) ?
Btw, we have another cluster(over gossiprouter) with the config :
===========jgroup========
<?xml version="1.0"?>
<config
    xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"
    xmlns="urn:org:jgroups"
    <TUNNEL bind_addr="match-interface:{{ msssite_bind_addr }}"
        bind_port="0"
        use_nio="false"
        gossip_router_hosts="{{ sndHostList }}"
        port_range="0" />
    <PING />
    <MERGE3 />
    <FD_ALL3 />
    <VERIFY_SUSPECT />
    <pbcast.NAKACK2 use_mcast_xmit="false" />
    <UNICAST3 />
    <pbcast.STABLE />
    <pbcast.GMS />
    <UFC />
    <FRAG2 />
    <pbcast.STATE_TRANSFER />
    <CENTRAL_LOCK />
</config>
======jgroup================
where we have the same problem, except that "max_threads" value here is : 100.
What would be the configuration if we want to provide customized value to "max_threads"?

Bela Ban

unread,
Jan 30, 2023, 6:18:12 AM1/30/23
to jgrou...@googlegroups.com
This happens when the thread pool is exhausted (200); then all threads
are dumped. However, in your thread dump below I don't see 200 threads;
as a matter of fact, I don't see a *single JGroups thread*!

Can you post the full thread dump?
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/4e5e066d-a2b3-44bb-ab15-6a83c60406f1n%40googlegroups.com <https://groups.google.com/d/msgid/jgroups-dev/4e5e066d-a2b3-44bb-ab15-6a83c60406f1n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Bela Ban | http://www.jgroups.org

kumar bava

unread,
Mar 7, 2023, 1:05:43 PM3/7/23
to jgroups-dev
Hi Bela,
thanks for the answer. Here is the full thread dump attached.
thread-dump.txt

Bela Ban

unread,
Mar 7, 2023, 3:06:05 PM3/7/23
to jgrou...@googlegroups.com
Your thread pool's max size is 100, and the thread dump shows that you
have 58 threads in BLOCKED state waiting to lock <0x6f4dcd31>, e.g.:

"jgroups-11402,mss-site,sero-nfsorch-1-f13-97" #12477 prio=0 tid=0x30bd
nid=NA blocked
java.lang.Thread.State: BLOCKED
#011at
app//org.infinispan.remoting.transport.impl.SingleTargetRequest.onNewView(SingleTargetRequest.java:56)
#011- waiting to lock <0x6f4dcd31> (a
org.infinispan.remoting.transport.impl.SingleResponseCollector)

That lock is held by thread jgroups-11397:

"jgroups-11397,mss-site,sero-nfsorch-1-f13-97" #12471 prio=0 tid=0x30b7
nid=NA runnable
java.lang.Thread.State: RUNNABLE#011at
java...@11.0.12/sun.util.locale.BaseLocale.hashCode(BaseLocale.java:148)
#011at java...@11.0.12/java.util.Locale.hashCode(Locale.java:2103)
#011at
java...@11.0.12/java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
#011at
java...@11.0.12/sun.util.locale.provider.LocaleProviderAdapter.getAdapter(LocaleProviderAdapter.java:241)
#011at
java...@11.0.12/java.text.DecimalFormatSymbols.getInstance(DecimalFormatSymbols.java:178)
#011at java...@11.0.12/java.util.Formatter.getZero(Formatter.java:2437)
#011at java...@11.0.12/java.util.Formatter.<init>(Formatter.java:1956)
#011at java...@11.0.12/java.util.Formatter.<init>(Formatter.java:2011)
#011at java...@11.0.12/java.lang.String.format(String.java:2938)
#011at
app//org.infinispan.util.logging.Log_$logger.remoteNodeSuspected(Log_$logger.java:1970)
#011at
app//org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:31)
#011at
app//org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31)
#011at
app//org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17)
#011at
app//org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23)
#011at
app//org.infinispan.remoting.transport.impl.SingleTargetRequest.addResponse(SingleTargetRequest.java:74)
#011at
app//org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:44)
#011- locked <0x6f4dcd31> (a
org.infinispan.remoting.transport.impl.SingleResponseCollector)
#011at
app//org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:51)


This thread is in RUNNABLE state, but I suspect that the
Log.remoteNodeSuspected() statement is taking quite a while, so that the
thread pool increases to 100, and thus the thread dump by JGroups (as a
warning).

So this might be a temporary spike, but I recommend looking into how
long the above log method takes, so that it has other threads queuing up
for the <0x6f4dcd31> lock.
> https://groups.google.com/d/msgid/jgroups-dev/4e5e066d-a2b3-44bb-ab15-6a83c60406f1n%40googlegroups.com <https://groups.google.com/d/msgid/jgroups-dev/4e5e066d-a2b3-44bb-ab15-6a83c60406f1n%40googlegroups.com> <https://groups.google.com/d/msgid/jgroups-dev/4e5e066d-a2b3-44bb-ab15-6a83c60406f1n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/jgroups-dev/4e5e066d-a2b3-44bb-ab15-6a83c60406f1n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> Bela Ban | http://www.jgroups.org <http://www.jgroups.org>
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/389d23b3-34c2-411e-af86-df442d485e3dn%40googlegroups.com <https://groups.google.com/d/msgid/jgroups-dev/389d23b3-34c2-411e-af86-df442d485e3dn%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages