Body:
We are running a JBoss/WildFly application cluster using ActiveMQ Artemis JMS with persistent messaging.
We are facing an intermittent but severe production issue with the following behavior:
Observed symptomsApplication becomes unresponsive without high CPU or GC pressure.
Thread dumps show:
Multiple ActiveMQ Artemis IO / server threads in WAITING / TIMED_WAITING
Application threads blocked inside JMS operations (send, receive, or ActiveMQRASessionFactoryImpl.allocateConnection)
At the same time, we observe database lock contention caused by long-running transactions.
Restarting or shutting down one application node releases DB locks on that node.
The same DB lock contention then appears on another node in the cluster.
This continues until the JMS store is cleared/flushed, after which the issue stops completely.
No JVM deadlocks are detected.
No CPU or GC saturation.
Clearing the JMS persistent store permanently resolves the issue.
This strongly suggests JMS persistent store contention or backlog rather than a DB or JVM issue.
Under what conditions can ActiveMQ Artemis persistent store / journal contention cause:
JMS operations to block indefinitely
Application threads to hold DB transactions open
Can poison messages, redelivery storms, or store recovery/compaction cause this behavior?
Are there known scenarios where store backlog causes contention that “moves” between cluster nodes as nodes are restarted?
What Artemis configuration or metrics should be monitored to detect this early?
Journal configuration
Store I/O latency
Redelivery / DLQ thresholds
What are the recommended best practices to avoid DB lock cascades caused by JMS operations?
Transaction boundaries
XA vs non-XA
Outbox or async patterns