JBatch Recovery on Hardware Failure or App Server Crash (e.g. out of memory / heap size)

7 views

Skip to first unread message

Aleksandar Kanchev

unread,

Jun 19, 2018, 12:02:28 PM6/19/18

to Payara Forum

Hi,

is there a recovery mechanism for Jobs which were running during a hardware crash or jvm crash in clustered environment?

Hazelcast is already running but is JBatch taking advantage of this to track if the DB persisted job execution status matches the actual Job execution within the cluster?

For example what happens when a job is started on a random node within the payara cluster, then due to hardware failure or heap size problem the server node gets killed without having a chance to update the job's execution state within the database. When the payara node gets restarted is it going to detect that the execution status (e.g. STARTED) is actually stale?

In our environment we need to serialize the execution of some jobs. Thus we only schedule the next job whenever the previous one is completed. If JBatch would report a stale status then we'd never schedule the next job and we're not aware of any proper way to detect such a stale job execution status in a clustered environment.