Avoiding re-run of serviceTask when suspend/activate process instance

774 views
Skip to first unread message

galen...@gmail.com

unread,
Feb 24, 2014, 3:05:05 PM2/24/14
to camunda-...@googlegroups.com
Hi,

I have a question about the behavior I'm seeing when I suspend a process while a serviceTask is executing (code in a JavaDelegate).

Here's an example flow:

http://camunda.org/share/#/process/fa6dcbf8-6a0d-4272-b0fe-959ae44efdfa

If you:

1) start this process
2) suspend via Cockpit while serviceTask is running
3) wait til serviceTask finishes executing (will throw stacktrace -- see below)
4) re-activate process instance

I understand that at this point I the process will continue on when the job retry fires (about 5 minutes later). However, in my case, since the serviceTask actually finished executing successfully, and didn't really "fail" (well it only failed in the sense that the instance wasn't active when it completed).

Therefore, I would rather avoid the situation where it retries. I would rather it just retry the "complete" task portion of the code, not the entire

public void execute(final DelegateExecution execution)

body of code (which already ran).


Are there some recommendations and/or best practices about how this can be accomplished? Can the JavaDelegate check (with perhaps retries) whether the process instance is active before falling out of the execute method?

I hope my question makes sense.

Thanks,
Galen

Roman Smirnov

unread,
Feb 26, 2014, 7:28:57 AM2/26/14
to camunda-...@googlegroups.com, galen...@gmail.com
Hi Galen,

Do you get an OptimisticLockingException?

The OptimisticLockingException is the expected behavior. In your case you have two transactions:

One transaction executes the service task. The second transaction is triggered by cockpit to suspend the process instance. The second transaction increments the revision of the process instance (ie. of the process instance and their sub executions) and commits the suspension of the process instance. Then the first transaction (which executed the service task in the meanwhile) increments also the revision of the process instance such that concurrent update fail with an OptimisticLockingException.

Therefore we implemted a test case:


So, you do not have really an opporntunity to avoid this behavior!

Cheers,
Roman

webcyberrob

unread,
Feb 26, 2014, 3:15:51 PM2/26/14
to camunda-...@googlegroups.com, galen...@gmail.com
Hi Roman & Galen

I expected this to be the case as well as Ive run into quite a few unexpected behaviours due to the combination of race conditions and optimistic locking.

Hence my best practice design principles at the moment around service tasks in particular;
  • Service tasks and optimistic locking work best if the service task states are either not started or complete eg no doing state.
  • As a consequence, don't use interrupting boundary events on service tasks (there's nothing to interrupt)
  • If the service does require significant compute & time, then use a service task to initiate the process and a receive task to receive a callback that the service is complete. (Note an interrupting boundary event on a transaction which wraps these two tasks now makes sense...)

However, even with these principles in place I have only minimised opportunities for race conditions, I have not removed them. Hence this led me to pnder - Does BPMN (or Camunda) require a Semaphore/Critcal Sections/Synchronization construct in order to avoid non-deterministic behaviour?

regards

Rob

galen...@gmail.com

unread,
Feb 26, 2014, 5:25:35 PM2/26/14
to camunda-...@googlegroups.com, galen...@gmail.com
Hi Roman and Rob,

Roman, yes I think I did get an OptimisticLockingException.

In terms of process instance suspension, It seems to me, that if possible, the implementation could:

1) Let all of the currently active/executing tasks "drain out" before committing the suspension of the process instance.

2) Prevent new tasks from being started (prevent the token(s) from moving), but it would let the current tasks complete successfully (without exception). I guess technically this solution would allow for one last token movement only on currently active tasks.

I think the whole purpose of suspension is to prevent further token movement, right? As Rob pointed out, service tasks are not always interruptable, so it seems the best thing to do would be to let them finish, as opposed to do something you know will cause a failure/retry.

I don't know how feasible this would be, especially with non-asynchronous tasks, but it seems like the behavior I would expect as an end-user.

I guess another solution would be to set retries to zero for tasks you know for sure should never run twice.

Rob: Thanks for the tips, those all make sense, and I see how they would minimize, but not entirely eliminate the race conditions. For your third point, a message bus (to hold a de-coupled "service complete" message) and a consume task (instead of a callback) would probably work, even in the face of a suspended process instance. Do have an example of the "interruping boundary event on a transaction which wraps these two tasks"? I'm still trying to understand that concept...

Thanks,
Galen

webcyberrob

unread,
Feb 27, 2014, 2:13:31 AM2/27/14
to camunda-...@googlegroups.com, galen...@gmail.com
Hi Galen,

In terms of your points (1) and (2), I agree with the sentiment, eg allow in flight to 'drain' and prevent anything new...

With regard to your further question:
Lets assume I have a service which could take a minute, and there's a valid reason for interrupting during this minute (and the interrupt means abandon the service).

Option 1 - I just use a service task with an interupting boundary event
IMHO this is bad because it could consume an engine thread for a minute, and if the interrupting boundary event fires, I haven't really interrupted the service thread, I'll get an optimistic locking exception when the service thread eventually completes and tries to mark the task as complete because the interrupt has already done this.

Option 2 - I use a service task to initiate an asych compute process (running in a thread outside of the engine context) and a receive task to wait for notification that the process is complete (this is what I mean by callback, eg some way for the asynch service to indicate its complete, eg message etc). In addition if I wrap these two tasks in a sub-process context, I can put an interrupting event on this outer context. Hence now if the interrupt is triggered, I can potentially signal the 'external' processing thread to cease, and I don't get the optimistic locking exception in the engine (apart from the extremely short race condition during asynch initiation).

Note: Camunda service tasks do have an 'asynch' execution mode, but Im not a big fan as I prefer the asych nature to be visible in the process model. It could be argued however, that a business user (and their process model) is not concerned with synch versus asynch services and thus this detail should not be in the model...

I hope this makes sense,

regards

Rob

Roman Smirnov

unread,
Feb 27, 2014, 8:17:20 AM2/27/14
to camunda-...@googlegroups.com, galen...@gmail.com
Hi Galen & Rob,

to avoid an OptimisticLockingException you have to implement a kind of a merge mechanism. So, for example:

Transaction t1 executes a service task and transaction t2 suspends the corresponding process instance.

First t2 commits its changes (ie. set the suspension flag to "suspended" - but does not increment the revision). Second t1 has been executed successfully and commits its changes (an OptimisticLockingException will not be thrown).

The result is that t1 overrides the changes of t2: the suspension state of the process instance would be "running" and would not be "suspended" anymore. To avoid this you have to implement a merge mechanism, that the changes of t1 will be merged into the changes of t2. It could be very tricky to implement those kind of merge mechanism.

That's why we provoke an OptimisticLockingException.

Cheers,
Roman
Reply all
Reply to author
Forward
0 new messages