We're running Wildfly 27 with a couple of EAR and WAR modules deployed. Every couple of weeks or so we see the Wilfly JVM hit 100% CPU without apparent reason. It won't recover until Wildfly gets restarted.
Since we have JFR creating a continious recording I managed to create a JFR dump last time the JVM was at 100% load. I can see that there are 5824 threads out of wich are 4714 threads named 'blocking-thread--p8-t1' (with different numbers) and 1610 threads named 'non-blocking-thread--p76-t1' (also with different numbers). That doesn't seem right.
The stack traces of all those threads seem to be more or less the same:
"non-blocking-thread--p76-t1" #766 daemon prio=5 os_prio=0 cpu=6410.39ms elapsed=177251.53s tid=0x000056067d1503b0 nid=0x19d0be waiting on condition [0x00007f6c47f7e000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java...@17.0.8/Native Method)
- parking to wait for <0x00000004ba320d30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java...@17.0.8/LockSupport.java:341)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java...@17.0.8/AbstractQueuedSynchronizer.java:506)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java...@17.0.8/ForkJoinPool.java:3465)
at java.util.concurrent.ForkJoinPool.managedBlock(java...@17.0.8/ForkJoinPool.java:3436)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java...@17.0.8/AbstractQueuedSynchronizer.java:1623)
at java.util.concurrent.LinkedBlockingQueue.take(java...@17.0.8/LinkedBlockingQueue.java:435)
at java.util.concurrent.ThreadPoolExecutor.getTask(java...@17.0.8/ThreadPoolExecutor.java:1062)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java...@17.0.8/ThreadPoolExecutor.java:1122)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java...@17.0.8/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java...@17.0.8/Thread.java:833)
From the names 'blocking-thread' and 'non-blocking-thread' I assume that they're related to Infinispan which our applications take heavy use of (
https://docs.wildfly.org/25/wildscribe/subsystem/infinispan/cache-container/thread-pool/non-blocking/index.html).
If I'm right and these threads are related to Infinispan I assume it is because our application creates the cache-containers from an external xml file and doesn't inject managed cache-containers configured in standalone.xml. This way the parameters like keepalive-time, max-threads, etc. aren't applied and thus some thread pools inside Infinispan just keep growing.
I'd like to know if I'm right and if yes if there's a way to configure those thread pool parameters in a different way.
Thank you very much in advance for any advice!
Michael