Hi Guys,
Ive notice a slight disconnect between the docs and job retry behaviour. As per the snippet from the user guide below, it reads as if by default when a job fails, the retry counter is decremented however the job will not be retried until the job acquisition lock has expired. Hence using defaults and no explicit retry strategy, I expected a failing task to retry 3 times, once every 5 minutes or so.
Upon failure of job execution, e.g. if a service task invocation throws an exception, a job will be retried a number of times (by default 3). It is not immediately retried and added back to the acquisition queue, but the value of the RETRIES_ column is decreased. The process engine thus performs bookkeeping for failed jobs. After updating the RETRIES_ column, the executor moves on to the next job. This means that the failed job will automatically be retried once the LOCK_EXP_TIME_ date is expired.
However what I have observed is on job failure, the job retries is decremented however the job is available for acquisition immediately and thus retries occur within very quick succession (eg milli seconds). If I put an explicit retry strategy eg R3/PT5M then I get the expected behaviour. In looking at the engine code, it looks like the lock is cleared on exception and thus the job can be immediately acquired again if there is no explicit retry strategy.
Hence does the documentation need to be updated, or is this an engine bug? Note I am using 7.2-Final
regards
Rob