Message Driven Bean stops working under high volume - requires Wildfly restart

588 views
Skip to first unread message

John Strecker

unread,
Oct 19, 2023, 12:22:37 PM10/19/23
to WildFly
In production - we've been having an issue with a particular long running message driven bean that randomly stops consuming messages from our external webservice to a client. Messages continue to be placed on its queue, but nothing is processed causing a queue backlog.   Requires application restart to flush backlog

Currently running Wildfy26 with JDK11. - ActiveMQ 11.0 packaged with Wildfly26. We've added extensive logging and have not seen anything output to point us in the proper direction. 

1) Thinking it was a thread pool issue we've added below to standalone.bat  No luck
-Dactivemq.artemis.client.global.thread.pool.max.size=-1
2) Thinking it was a pool issue we've up the MDB pool jount to 10,000 - no luck
3) To address the backlog - added retry params to standalone-full  - no luck
 <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue"
max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10"
redelivery-delay="5000" redelivery-multiplier="2.0" max-redelivery-delay="1800000"
4) Added throttling   logic on our side to manage thread access threads  to bean- no luck

So were at a loss as to why a MDB would just stop processing messages with no exception output.   Any insight or guidance would be appreciated.

Thanks


Marvin Geitner

unread,
Oct 20, 2023, 2:50:07 PM10/20/23
to WildFly
Possibly not the solution, but I had a similar problem a while back, with WildFly 14.0.1.Final and the default Artemis (ActiveMQ) version of that server version.

Problem for me was that within Artemis an AtomicInteger (or was it AtomicLong, I don't remember) overflowed, causing this value to become negative. This in turn affected the flow control of Artemis, so that the internally calculated value was no longer correct and from that point on no messages were processed. I found it out by explicit logging of Artemis or the class that is responsible for the flow control. After some time it could be seen that the calculated value of the flow control became negative.

Temporarily I solved it by disabling the flow control via the following:
<message-driven>
    <ejb-name>[...]</ejb-name>
    <ejb-class>[...]</ejb-class>
    <activation-config>
        [...]
        <activation-config-property>
            <activation-config-property-name>consumerWindowSize</activation-config-property-name>
            <activation-config-property-value>-1</activation-config-property-value>
        </activation-config-property>
    </activation-config>
</message-driven>

streck...@gmail.com

unread,
Oct 20, 2023, 3:31:59 PM10/20/23
to Marvin Geitner, WildFly

Thanks for the response.

 

Yes – By luck I just found that property today searching online. Although I set ours to 0 to accommodate  slow consumer (per the article below) which our messages are since each delivered  message processes workflow checks for that order item(the message).  Some workflow business checks are quick some use external  IO (slow) etc.. So overall message time varies (1-second to xx seconds) . Giving '0’  a try now.  If that doesn’t work I’ll try -1 but not sure if that satisfies our scenario. 

  • -1 for an unbounded buffer (fast consumers)
  • 0 to not buffer any messages. (slow consumers)
  • >0 for a buffer with the given maximum size in bytes (default to 1 MB).

https://activemq.apache.org/components/artemis/documentation/1.0.0/flow-control.html

john

 

 

Sent from Mail for Windows

 

From: 'Marvin Geitner' via WildFly
Sent: Friday, October 20, 2023 2:50 PM
To: WildFly
Subject: Re: Message Driven Bean stops working under high volume - requires Wildfly restart

 

Possibly not the solution, but I had a similar problem a while back, with WildFly 14.0.1.Final and the default Artemis (ActiveMQ) version of that server version.

Problem for me was that within Artemis an AtomicInteger (or was it AtomicLong, I don't remember) overflowed, causing this value to become negative. This in turn affected the flow control of Artemis, so that the internally calculated value was no longer correct and from that point on no messages were processed. I found it out by explicit logging of Artemis or the class that is responsible for the flow control. After some time it could be seen that the calculated value of the flow control became negative.

Temporarily I solved it by disabling the flow control via the following:
<message-driven>
    <ejb-name>[...]</ejb-name>
    <ejb-class>[...]</ejb-class>
    <activation-config>

        [..]


        <activation-config-property>
            <activation-config-property-name>consumerWindowSize</activation-config-property-name>
            <activation-config-property-value>-1</activation-config-property-value>
        </activation-config-property>
    </activation-config>
</message-driven>

John Strecker schrieb am Donnerstag, 19. Oktober 2023 um 18:22:37 UTC+2:

In production - we've been having an issue with a particular long running message driven bean that randomly stops consuming messages from our external webservice to a client. Messages continue to be placed on its queue, but nothing is processed causing a queue backlog.   Requires application restart to flush backlog

 

Currently running Wildfy26 with JDK11. - ActiveMQ 11.0 packaged with Wildfly26. We've added extensive logging and have not seen anything output to point us in the proper direction. 

 

1) Thinking it was a thread pool issue we've added below to standalone.bat  No luck

-Dactivemq.artemis.client.global.thread.pool.max.size=-1

2) Thinking it was a pool issue we've up the MDB pool jount to 10,000 - no luck

3) To address the backlog - added retry params to standalone-full  - no luck

 <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue"
max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10"
redelivery-delay="5000" redelivery-multiplier="2.0" max-redelivery-delay="1800000"

4) Added throttling   logic on our side to manage thread access threads  to bean- no luck

 

So were at a loss as to why a MDB would just stop processing messages with no exception output.   Any insight or guidance would be appreciated.

 

Thanks

 

 

--
You received this message because you are subscribed to a topic in the Google Groups "WildFly" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wildfly/IlrtN_wKZgU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wildfly+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wildfly/0d1c69e2-e183-402e-9eb2-1ebf9767db2bn%40googlegroups.com.

 

John Strecker

unread,
Oct 21, 2023, 10:04:53 AM10/21/23
to WildFly
Well setting to 0 did not work. Will try -1 setting for tonight's burst

Marvin Geitner

unread,
Oct 22, 2023, 2:18:57 AM10/22/23
to WildFly

Maybe it helps you with the troubleshooting, at least it gives a good insight into the behavior of Artemis and the processing of the messages... I had activated the following classes for logging and got the error. (Precondition is of course that in WildFly26 / ActiveMQ this structure is still present)

            <periodic-rotating-file-handler name="artemis_sci">
                <level name="ALL"/>
                <formatter>
                    <named-formatter name="PATTERN"/>
                </formatter>
                <file relative-to="jboss.server.log.dir" path="artemis_sci.log"/>
                <suffix value=".yyyy-MM-dd"/>
            </periodic-rotating-file-handler>
            <periodic-rotating-file-handler name="artemis_cci">
                <level name="TRACE"/>
                <formatter>
                    <named-formatter name="PATTERN"/>
                </formatter>
                <file relative-to="jboss.server.log.dir" path="artemis_cci.log"/>
                <suffix value=".yyyy-MM-dd"/>
            </periodic-rotating-file-handler>

            <logger category="org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl" use-parent-handlers="true">
                <handlers>
                    <handler name="artemis_sci"/>
                </handlers>
            </logger>
            <logger category="org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl" use-parent-handlers="true">
                <level name="TRACE"/>
                <handlers>
                    <handler name="artemis_cci"/>
                </handlers>
            </logger>
            <logger category="org.apache.activemq.artemis.core.server.impl.QueueImpl" use-parent-handlers="true">
                <level name="TRACE"/>
                <handlers>
                    <handler name="artemis_cci"/>
                </handlers>
            </logger>


streck...@gmail.com

unread,
Oct 22, 2023, 9:32:41 AM10/22/23
to Marvin Geitner, WildFly

Thank you.. Let you know my findings

streck...@gmail.com

unread,
Oct 22, 2023, 6:22:03 PM10/22/23
to Marvin Geitner, WildFly

Marvin;

 

This helped. I do see various outputs  around our workflow  rule queue. All center around that ‘rule’ queue as suspected. So I assume this is because the workflow rules are taking to long to run given the # of inbound requests.  i.e. Flow Control. Which I thought the consumerWindowSize would remedy. Is there anywhere in the log to see if the consumerWindow size  is set correctly (0 in my case)?  I set to -1 as previously stated and still got the same result sooner.  My last resort is to wrap a synchronized block  upstream in the webservice call which in turn calls JMS code to throttle user message requests.

 

Again appreciate any  help.

 

 

>>>>> 

 

:All the consumers were busy, giving up now

 

queueName=jms.queue.orderrx}::Sending 1 credit to start delivering of one message to slow consumer

 

:FlowControl::delivery standard taking 12095 from credits, available now is -12094

 

 

 

2023-10-22 13:41:30,071 DEBUG [org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl] (Thread-65 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@124511b6)) ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=jms.queue.rule, queue=QueueImpl[name=jms.queue.rule, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@3b72dca4, filter=null, name=jms.queue.rule, clusterName=jms.queue.rulef927c486-54da-11ed-944b-f4a4754f2743]] is busy for the lack of credits. Current credits = 0 Can't receive reference Reference[350039937925]:RELIABLE:CoreMessage[messageID=350039937925,durable=true,userID=null,priority=4, timestamp=Sun Oct 22 13:41:30 EDT 2023,expiration=0, durable=true, address=jms.queue.rule,size=1433,properties=TypedProperties[__AMQ_CID=3bb5d6a5-7102-11ee-a34b-000d3a9ae932,_AMQ_ROUTING_TYPE=1]]@873651483

2023-10-22 13:41:30,071 DEBUG [org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl] (Thread-65 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@124511b6)) ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=jms.queue.rule, queue=QueueImpl[name=jms.queue.rule, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@3b72dca4, filter=null, name=jms.queue.rule, clusterName=jms.queue.rulef927c486-54da-11ed-944b-f4a4754f2743]] is busy for the lack of credits. Current credits = 0 Can't receive reference Reference[350039937925]:RELIABLE:CoreMessage[messageID=350039937925,durable=true,userID=null,priority=4, timestamp=Sun Oct 22 13:41:30 EDT 2023,expiration=0, durable=true, address=jms.queue.rule,size=1433,properties=TypedProperties[__AMQ_CID=3bb5d6a5-7102-11ee-a34b-000d3a9ae932,_AMQ_ROUTING_TYPE=1]]@873651483

2023-10-22 13:41:30,071 DEBUG [org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl] (Thread-65 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@124511b6)) ServerConsumerImpl [id=0, filter=null, binding=LocalQueueBinding [address=jms.queue.rule, queue=QueueImpl[name=jms.queue.rule, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@3b72dca4, filter=null, name=jms.queue.rule, clusterName=jms.queue.rulef927c486-54da-11ed-944b-f4a4754f2743]] is busy for the lack of credits. Current credits = 0 Can't receive reference Reference[350039937925]:RELIABLE:CoreMessage[messageID=350039937925,durable=true,userID=null,priority=4, timestamp=Sun Oct 22 13:41:30 EDT 2023,expiration=0, durable=true, address=jms.queue.rule,size=1433,properties=TypedProperties[__AMQ_CID=3bb5d6a5-7102-11ee-a34b-000d3a9ae932,_AMQ_ROUTING_TYPE=1]]@873651483

 

 

 

Sent from Mail for Windows

 

Sent: Sunday, October 22, 2023 2:19 AM

streck...@gmail.com

unread,
Oct 22, 2023, 6:32:17 PM10/22/23
to Marvin Geitner, WildFly

One last – if busy the redelivery attempts should kick in as well. Below are the settings from our production server

 

Thanks

 

 

<address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" redelivery-delay="5000" redelivery-multiplier="2.0"

max-delivery-attempts="35" max-redelivery-delay="259200000" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10"/>

 

Sent from Mail for Windows

 

1.    -1 for an unbounded buffer (fast consumers)

2.    0 to not buffer any messages. (slow consumers)

3.    >0 for a buffer with the given maximum size in bytes (default to 1 MB).

https://activemq.apache.org/components/artemis/documentation/1.0.0/flow-control.html

john

 

 

Sent from Mail for Windows

 

From: 'Marvin Geitner' via WildFly
Sent: Friday, October 20, 2023 2:50 PM
To: WildFly
Subject: Re: Message Driven Bean stops working under high volume - requires Wildfly restart

 

Possibly not the solution, but I had a similar problem a while back, with WildFly 14.0.1.Final and the default Artemis (ActiveMQ) version of that server version.

Problem for me was that within Artemis an AtomicInteger (or was it AtomicLong, I don't remember) overflowed, causing this value to become negative. This in turn affected the flow control of Artemis, so that the internally calculated value was no longer correct and from that point on no messages were processed. I found it out by explicit logging of Artemis or the class that is responsible for the flow control. After some time it could be seen that the calculated value of the flow control became negative.

Temporarily I solved it by disabling the flow control via the following:
<message-driven>

    <ejb-name>[..]</ejb-name>
    <ejb-class>[...]</ejb-class>
    <activation-config>

Emmanuel Hugonnet

unread,
Oct 23, 2023, 3:44:46 AM10/23/23
to Marvin Geitner, WildFly
You can set it at the (pooled)-connection-factory level instead of the MDB. Same for the consumer-max-rate.
Also maybe you can reduce you number of request on the producer side of things ?

Emmanuel
> --
> You received this message because you are subscribed to the Google Groups "WildFly" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to wildfly+u...@googlegroups.com.
> <https://groups.google.com/d/msgid/wildfly/0d1c69e2-e183-402e-9eb2-1ebf9767db2bn%40googlegroups.com?utm_medium=email&utm_source=footer>.

streck...@gmail.com

unread,
Oct 23, 2023, 8:21:38 AM10/23/23
to Emmanuel Hugonnet, Marvin Geitner, WildFly

Yea – we’ve been pushing our external client  to slow down injects. To date they have not. Not a good partner. Which is why I was going to  synchronize the threads related to the external calls – so maybe they will balk at the added response delay to get their attention.

 

Question: Is there a way to increase the # of queue consumer threads from the Wildfly default of 15?

 

Thanks

 

Sent from Mail for Windows

 

From: Emmanuel Hugonnet
Sent: Monday, October 23, 2023 3:44 AM
To: Marvin Geitner; WildFly
Subject: Re: Message Driven Bean stops working under high volume - requires Wildfly restart

 

You can set it at the (pooled)-connection-factory level instead of the MDB. Same for the consumer-max-rate.

You received this message because you are subscribed to a topic in the Google Groups "WildFly" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/wildfly/IlrtN_wKZgU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to wildfly+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/wildfly/377f5e39-22e1-4ba8-8a14-10b78215b748%40redhat.com.

 

Marvin Geitner

unread,
Oct 23, 2023, 8:30:48 AM10/23/23
to WildFly

By '# of queue consumer' do you mean the number of threads that process the messages? This can be set via the following. You should also then see a corresponding number of ActiveMQ related threads in a profiler.

<activation-config-property>
    <activation-config-property-name>maxSession</activation-config-property-name>
    <activation-config-property-value>32</activation-config-property-value>
</activation-config-property>

Regarding profilers; if processing is blocked/stopped, have you ever made a recording with e.g. JFR or similar tools? I would be interested to see what the threads do. Are they blocked, timed waiting or parked? 

Emmanuel Hugonnet

unread,
Oct 23, 2023, 11:39:44 AM10/23/23
to streck...@gmail.com, Marvin Geitner, WildFly
Sorry I meant to use flow control on the producer side of things:
https://activemq.apache.org/components/artemis/documentation/latest/flow-control.html#producer-flow-control

When configuring number of MDBs concurrently processing messages, two parameters needs to be considered

* mdb-strict-max-pool - The number of MDB instances available to concurrently receive messages from the JCA RA's sessions - configured in
EJB subsystem
* maxSession - The number of sessions the JCA RA can use concurrently to consume messages - configured by MDB Activation config property

Maximal number of concurrently processed messages equals to minimal value of these parameters.

Emmanuel


Le 23/10/2023 à 14:21, streck...@gmail.com a écrit :
>
> Yea – we’ve been pushing our external client  to slow down injects. To date they have not. Not a good partner. Which is why I was going to
>  synchronize the threads related to the external calls – so maybe they will balk at the added response delay to get their attention.
>
> Question: Is there a way to increase the # of queue consumer threads from the Wildfly default of 15?
>
> Thanks
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
>
> *From: *Emmanuel Hugonnet <mailto:ehug...@redhat.com>
> *Sent: *Monday, October 23, 2023 3:44 AM
> *To: *Marvin Geitner <mailto:geitne...@googlemail.com>; WildFly <mailto:wil...@googlegroups.com>
> *Subject: *Re: Message Driven Bean stops working under high volume - requires Wildfly restart

streck...@gmail.com

unread,
Oct 23, 2023, 11:55:58 AM10/23/23
to Emmanuel Hugonnet, Marvin Geitner, WildFly

Yes - mdb-strict-max-pool I have already adjusted a while back. I have not touched max_session until your suggestion.   I have updated that on a few higher volume MDBs along with adding the consumerwindowSize=0    

 

Monitoring now.

 

Thanks

 

Sent from Mail for Windows

 

streck...@gmail.com

unread,
Oct 23, 2023, 11:59:24 AM10/23/23
to Emmanuel Hugonnet, Marvin Geitner, WildFly

Pardon for another dumb question – is there a where to see under Wildfly Admin console (or elsewhere) the ‘active’ sessions in relation to the max sessions? I see the adjusted consumer count on the bean to what I increased it to. Just curious to see what is currently being used. Kinda like JVM heap – what is in free  and what is being used etc..

 

Thanks

 

 

Sent from Mail for Windows

 

From: Emmanuel Hugonnet
Sent: Monday, October 23, 2023 11:39 AM

Reply all
Reply to author
Forward
0 new messages