Help with Payara HA Clustered JMS losing messages

Nick Hecht

unread,

Mar 10, 2017, 11:05:58 AM3/10/17

to Payara Forum

Hey All,

I have an odd issue i was hoping i could get someone to possibly point me in the right direction. I'm typically good at researching and figuring things out ( with googles help of course :) but im really lost on this one.

I'm using Payara (4.1.1.162 build 116) in a HA clustered environment and its working fairly well. However recently I have started to see complaints from employees using my application that jobs are getting stuck.

We use JMS queues to essentially trigger work on the back end. the work we submit is incremental, one process (Message Driven Bean) picks the message up and at its work the last thing it does is sends a new message to a different Queue essentially starting up a new process that works the data in a different way, multiple processes to for a workflow.

We have 2 servers in a cluster and I'm using MS_SQL server for the backed message persistence. Each node is set up as a LOCAL broker. I have a ton of logging i can see each process when it gets a message (the start of on message) and i log when the process completes and after I call send for the next Message. I can see the cluster working because I will notice the same batch id bounce back and forth from server to server as if processes through the workflow.

But very rarely i will get reports that a batch is not advancing like it should and as i view the logs i can see that at some point i get my Message sent log but i never see the next process log message received. We have ways of deleting a batch and starting over, and most of the time restarting a batch will cause it to run though the whole process with out a hitch.

I have spent days digging and logging and debugging, i fairly confident the code and server configuration is good because clearing and restarting a stuck batch works. my processes are not hanging mid way because i always see the sent message log right before the bean exits.

Where can i look, i would love to resolve this issue but I'm really at a loss. I have no idea where to start. I'm not getting exceptions or errors in my logs, and stopping an instance and running a single instance in the cluster resolves this issue. its not a single message driven bean that is the problem, i have seen this get stuck at many points in the chain. I don't see messages in the dmq.

Has anyone had similar issues and might know where i could start poking around? Your help is greatly appreciated.

Nick Hecht

unread,

Apr 5, 2017, 1:09:17 PM4/5/17

to Payara Forum

So i ended up finding something that fixed my initial issue. I changed Transaction Support on the connection factory from No-transaction to XATransaction, and this seemed to work for a while. But today we had a good deal of traffic on the application, and im assuming this is load related but now actually i can see messages getting stuck in queues. I'm using a tool called JMS Browser, and i can see there are many messages on various queues just sitting there not getting picked up. So its a similar problem and the symptoms are the same but looking into it i believe the cause is different.

I have been googling around and i found some old posts about concurrency issues where JMS is local or embedded? is this still an issue in payara 4.1.1.162 #badassfish (build 116)

Do i need to use a different JMS provider outside of payara?

Christoph John

unread,

Apr 6, 2017, 3:42:46 AM4/6/17

to Payara Forum

Sorry, if this might get duplicated. I've also answered by mail to payara...@googlegroups.com but the reply did not appear. Maybe it takes some time.

Hi Nick,

I don't have much experience with the clustering, but IMHO your MDBs are in some kind of deadlock situation, or waiting for a timeout or you do not have enough resources available, e.g. the threads in the thread pool are all busy doing other work so that no free thread is available to service the messages in the queue.

When the problem appears the next time I would take some thread dumps to check why the MDBs are not doing anything. You will probably see them hanging in a method waiting for something to happen (maybe for an external resource, e.g. DB connection or QueueSession to send the message to another queue).

Hope that helps.
Cheers,
Chris.

Ondrej Mihályi

unread,

Apr 6, 2017, 4:25:52 AM4/6/17

to Payara Forum

Hi Nick,

Christoph's advice is very reasonable. This is exactly what happened with one of our support customers, where many threads were blocked on a remote call until a timeout (actually without a timeout, which was even worse, blocking the threads infinitely). We were able to solve the problem many small improvements in the server configuration, the application code and broker configuration.

Increasing the appropriate thread pool size could be helpful, you just need to figure out which thread pool has to be adjusted (I think that for embedded OpenMQ it's EJB thread pool, but I'm not sure). Most thread pool sizes are by default set to 10, which is usually enough if threads are not blocked for a long time, but it's insufficient under a high load or when threads become stuck quite often.

Ondrej

Dňa štvrtok, 6. apríla 2017 9:42:46 UTC+2 Christoph John napísal(-a):

Reply all

Reply to author

Forward