Optimistic lock to protect access to process variable

930 views
Skip to first unread message

Lutz Kasselmann

unread,
Aug 11, 2015, 10:07:41 AM8/11/15
to camunda BPM users
Hi, folks!

Is there a way to protect a (short term) read-modify-write cycle of a central process variable explicitly by an optimistic lock?

In our particular case we execute such a read-modify-write cycle via the Camunda API. This is triggered by an external event. That is, from outside of the process and without the synchronisation behaviour of the Camunda job executor.

Our naive approach is to perform an innocuous update on the affected process instance DB record as the first step inside the same transaction in which the read-modify-write cycle takes place. By that we achieve a serialization of the accesses to the mentioned variable. Is there a smarter solution?

Cheers
Lutz

webcyberrob

unread,
Aug 12, 2015, 4:57:41 AM8/12/15
to camunda BPM users
Hi Lutz,

Could you clarify;

Do you mean you want to modify a process variable, or do you mean you may be storing a reference to a business object as a process variable, and the business object is persisted outside control of the engine?

If its a process variable, then Im puzzled by your question. For example, you may be using an external event which is correlated against a process instance and thus a task in the process instance executes and updates the value of the process variable. Whilst conceptually you could have two events in quick succession correlate to the same process instance, ultimately one will succeed and the other fail with an optimistic locking exception (or at least thats the behaviour I currently expect) and thus serial access to a process variable is thus implied...

Hence could you elaborate your use case a little more?

regards

Rob

Lutz Kasselmann

unread,
Aug 12, 2015, 7:05:25 AM8/12/15
to camunda BPM users
Hi Rob,

thank you for your response!

I am talking about a unique process variable which string value represents the serialized representation (JSON) of an object. This object is dynamic in content which means it is read, modified, and written again in the cycle mentioned ahead.

Actually these modifications are triggered by a queue consumer as reaction to an incoming JMS message. Each message is logically correlated to a particular process instance by the process instance's id included in the message.

We would like to avoid the introduction of a process artifact for this event execution (e.g. an event triggered sub process). Instead, the queue consumer should directly read-modify-write the variable by means of the API (RuntimeService).

Without any additional synchronisation arrangement two quasi concurrently arriving messages may lead to a dead lock recognized by the DBMS. This is the case for instance for the following interleaved sequence of occurrences:
1) DB-Transacion 1 (handling Message 1): read the variable
2) DB-Transacion 2 (handling Message 2): read the variable
3) DB-Transacion 1 (handling Message 1): write the (modified in content) variable => dead lock

The workflow engine seems to acquire a write lock for the process instance when a variable is read. At least the transactions in the previous example hinder mutually to do a write (but not a read).


Meanwhile we think about a pessimistic lock solution for this problem:
If we set another "dummy" process variable in the transaction right before the read-modify-write cycle to an arbitrary value, the workflow engine seems to acquire a read lock for the process instance.
By this all transactions are serialized avoiding a dead lock and even an optimistic lock clash. But is that robust?

Hope I could clarify our point!

Cheers
Lutz

thorben....@camunda.com

unread,
Aug 12, 2015, 8:13:52 AM8/12/15
to camunda BPM users
Hi Lutz,

The engine should not acquire a write lock when reading a variable, at least it does not perform a SELECT FOR UPDATE or anything in this direction. Of course the actual locking behavior depends on your database and the isolation level you use. It should work fine with "read committed".

Cheers,
Thorben

webcyberrob

unread,
Aug 12, 2015, 9:40:33 AM8/12/15
to camunda BPM users
Hi Thorben,

In this use case, I agree, I would not expect a deadlock. However Im curious, what is the engine's behaviour when it comes to concurrent modification of a process variable? I would anticipate an optimistic locking exception, but this is an area of the engine Im not familiar with.

regards

Rob

Lutz Kasselmann

unread,
Aug 12, 2015, 9:40:40 AM8/12/15
to camunda BPM users
Hi Thorben,

thanks for your reply!

Indeed, we already use "read committed" as isolation level for the workflow engine's data source. I will try to extract the relevant code to demonstrate the observed locking behaviour concerning the assumed write lock and provide it here. Stay tuned ;-)

Nevertheless, even if the interleaved sequence of occurrences in the example before did not run into a dead lock, how should we "atomize" the read-modify-write cycle on a process variable?

Cheers
Lutz

webcyberrob

unread,
Aug 12, 2015, 9:55:00 AM8/12/15
to camunda BPM users
Hi Lutz,

Interesting case. I'll leave it to Thorben to dig into the detail. Meanwhile, a few thoughts come to mind.

You could serialize access externally, ie lock a row in an external table etc, but I suspect you don't want to do that. A technique which comes to mind using BPMN constructs;

Let me assume you have message_A and message_B. You want to serialize access, but you don't know in advance the order.

Hence set up an inline subprocess and wait to receive a 'none' event (ie an event which will never occur). Put two interrupting boundary events on the subprocess, one for event A and one for event B. On each transition path, perform the business logic and then enter a receive state for the event which has not yet occured.

Hence if the events arrive in any order, but far enough apart, all is fine. If the events arrive concurrently, you will have a race condition in the engine. However, only one execution will succeed and the other fail in an optimistic locking exception, thus the engine will rollback the path which fails. However, onece this occurs, your process will no longer be waiting on either event, it should only be waiting on the event which just failed with the optimistic locking exception. Hence retry the event and you should end up with the equivalent of serial delivery.

At least thats what I would expect...

regards

Rob

thorben....@camunda.com

unread,
Aug 12, 2015, 10:43:19 AM8/12/15
to camunda BPM users
Hi guys,

@Rob: yes, the expected behavior is an OptmisticLockingException when the second transaction attempts to write its value to the database. This assumes that reading and writing is done in one transaction.

@Lutz: I think I need some more clarification :)

First of all, which database do you use?
Second, what exactly do you mean when you write the following?


1) DB-Transacion 1 (handling Message 1): read the variable
2) DB-Transacion 2 (handling Message 2): read the variable
3) DB-Transacion 1 (handling Message 1): write the (modified in content) variable => dead lock

In particular, I mean that you mention reading and writing variable values via API. If you use the plain engine, it begins a new transaction with every API call which is why I don't see how locks from reading the variable interfere with writing the variable. Or do you have some kind of transaction integration in place? If yes, could you please elaborate on that or have I overlooked it in the previous posts?

As Rob has already mentioned, the engine's way to handle parallel updates on the same database entities is by optimistic locking. That means, reading and modifying works for either of two parallel transactions, yet writing fails for whoever is the last to attempt a write. The proper way to deal with these situations is retry, that means the logic performed by the failed transaction can be retried based on the updated variable in the database. If you want to avoid such a situation (for example because reading/modifying a variable is expensive for you), you'll have to use a pessimistic lock as you suggest (such as fetching the variable with SELECT FOR UPDATE). Currently the engine has no facilities for pessimistically locking variables, so that is something you would have to implement yourself.

By the way, if you set the log level to FINE for the class org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager, you'll get a summary of every flush to the database the engine makes. Perhaps that helps identifying the  deadlocking statements.

Cheers,
Thorben

Lutz Kasselmann

unread,
Aug 13, 2015, 4:37:31 AM8/13/15
to camunda BPM users
Hi Rob,

yes, we already considered a pessimistic lock scheme i.e. by updating a shared db row (probably the process instance's one) as first operation in the variable modifying transaction. And yes, we actually like to avoid such a raw operation :-) But according to our observations there might be a tricky solution based on Camunda's API. That is what I mentioned in my preceding post:

If we set another "dummy" process variable in the transaction right before the read-modify-write cycle to an arbitrary value, the workflow engine seems to acquire a read lock for the process instance. By this all transactions are serialized avoiding a dead lock and even an optimistic lock clash. But is that robust?

Moreover, we don't like to introduce a process artifact (i.e. a BPMN solution) to handle the read-modify-write cycle in reaction of the referred message event. That is because the affected variable deals with a sole technical aspect of our system and the access should therefore not be reflected in the BPMN model.

Besides let me clarify one more point: We are not faced with two distinct types of messages which could be handled by two distinct boundary events. You might break down our case to even a simpler one:

Consider you have a counter in a unique process variable. You want to increment this counter by each occurrence of an external event. This event might be a JMS message, a user's mouse click in the UI or anything else. The key point is, that the counter should be incremented from outside a process artifact just by means of the workflow engine's API. Particularly, that means the transaction boundaries are set by yourself and not controlled by the workflow engine.

I wonder if the demand to safely maintain a shared information in an environment of potenitially concurrent accesses is really that fanciful?

Cheers
Lutz

Lutz Kasselmann

unread,
Aug 13, 2015, 4:46:25 AM8/13/15
to camunda BPM users
Hi Thorben,

sorry if my explanation was confusing so far. So you might think of the reduced case which I depicted in my last post to Rob:


Consider you have a counter in a unique process variable. You want to increment this counter by each occurrence of an external event. This event might be a JMS message, a user's mouse click in the UI or anything else. The key point is, that the counter should be incremented from outside a process artifact just by means of the workflow engine's API. Particularly, that means the transaction boundaries are set by yourself and not controlled by the workflow engine.

From my point of view there are two canonical ways to protect the counter's consistency in an environment of quasi parallel incrementations:
  1. Use of an atomic read-increment-write operation. Internally this could be implemented by a transaction with an exclusive lock (e.g. select-for-update).

  2. Use of a looped atomic compare-and-swap instruction to achieve an optimistic lock scheme. 
But I can't identify designated operations in the workflow engine's API to support these approaches - especially no way to mimic an atomic compare-and-swap for a setVariable call. This could by provided by an overloaded version with original read value as additional argument. (Internally the API method's implemenation could use a select-for-upate for an local pessimistic lock scheme, of course.) 

I hope I could irradiate my trouble ;-)

Cheers
Lutz

thorben....@camunda.com

unread,
Aug 13, 2015, 5:15:31 AM8/13/15
to camunda BPM users
Hi Lutz,

Ok, we are getting closer :)

I'm afraid there is no public API for what you desire. Yet, it can be done by implementing a custom command. That way, the engine performs read-modify-write in one transaction and flushing the write fails with an OptimisticLockingException if the variable was updated meanwhile. In that case, the read-modify-write cycle can be retried. See an example here: https://github.com/ThorbenLindhauer/camunda-engine-unittest/blob/serlaized-variable-modifications/src/test/java/org/camunda/bpm/unittest/SimpleTestCase.java

Cheers,
Thorben

Lutz Kasselmann

unread,
Aug 13, 2015, 7:44:44 AM8/13/15
to camunda BPM users
Hi Thorben,

thank you for your code sample!!!

For my understanding: How does the setVariable method recognize a write conflict? In an optimistic lock scheme one needs a notion of the variable's original state during the write operation -
this may either be meta-data like an object version or the variables original content itself. But neither is comprised in the method's signature?

Does camunda perform a book keeping behind the scenes, such like storing the original variable's value under its name in a (thread local) transaction context on getVariable to re-use it later on setVariable?

However, we first have to understand, how the transaction management connected to getCommandExecutorTxRequired plays together with the transaction management of our application server.
From the naming (getCommandExecutorTxRequired, getCommandExecutorTxRequiresNew ...) I guess the executors hook into the JEE standard transaction discrimination, am I right?


Nevertheless it remains my previously given argument that getVariable acquires a write lock which causes mutual blocking of concurrently running transactions with the result of a dead lock.
But I know, I am still in debt with a "proof" by code sample :-)

Cheers
Lutz

thorben....@camunda.com

unread,
Aug 13, 2015, 7:59:42 AM8/13/15
to camunda BPM users
Hi Lutz,

Yes, the book keeping explanation is the correct one. For a command that is executed by CommandExecutor#execute, the engine keeps a cache of database entities it has already loaded/updated/deleted. The cache is part of the CommandContext object that is handed into the command's #execute method. So when we call #getVariable, the variable entity is cached with its current revision (that is the used for optimistic locking). When calling #setVariable, the revision is incremented and when the command finishes, the updated entity is flushed to the database. If the revision has been incremented meanwhile, we throw an OptimisticLockingException. The cache is basically there to a) avoid selecting one entity twice b) be able to flush all inserts/updates/delete at the very end of the transaction, keeping write locks as short as possible. b) is also why I wonder that reading keeps a lock.

You are right about transaction integration as well. There are subclasses of ProcessEngineConfigurationImpl that integrate with JTA and Spring transaction management. Some documentation can be found at [1] and [2]. Note though that the above-mentioned entity cache is bound to the lifecycle of a command, not the transaction it uses. That means, simply wrapping the two API request #getVariable and #setVariable in one outer transaction won't work without the custom command as in the previous code sample.

Cheers,
Thorben

[1] http://docs.camunda.org/7.3/guides/user-guide/#cdi-and-java-ee-integration-jta-transaction-integration
[2] http://docs.camunda.org/7.3/guides/user-guide/#spring-framework-integration-spring-transaction-integration

Lutz Kasselmann

unread,
Aug 13, 2015, 9:45:08 AM8/13/15
to camunda BPM users
Hi Thorben,

if I understand aright the only way to benefit from camunda's optimistic lock behaviour for a getVariable-setVariable pairing seems to let both calls take place in the same command context. This is obviously not the case if we naively call these both RuntimeService-methods from e.g. a message driven been directly even in the same DB transaction (i.e. not from a process artifact). The command context is the jumping point.

We are now going to fiddle apart the lifecycles and relations of CommandContext, CommandExecutor, and Thread to apply these means in a hopefully correct manner, paying for the costs of automagic ;-)

Furthermore we will investigate the suspected write lock on getVariable. Maybe the things change when using the command executor...


One more question concerning your explanations: You stated, among other reasons the cache is used to be able to flush all inserts/updates/delete at the very end of the transaction. If so, how can a write conflict be recognized already from inside setVariable? I would expect the recognition not before flush. If the optimistic locking shall work accurately in a distributed application server environment the check for a conflict should regard the central state inside the DB. Maybe setVariable performs a "select-ahead" for early conflict detection? But even if so, then flush must re-do it again resulting in a potential OptimisticLockException at flush time. I guess we have to switch on the ibatis logs to understand it in more detail.


However, thank you very much for your elaborated support so far!!!

Cheers
Lutz

thorben....@camunda.com

unread,
Aug 13, 2015, 10:22:21 AM8/13/15
to camunda BPM users
Hi Lutz,


One more question concerning your explanations: You stated, among other reasons the cache is used to be able to flush all inserts/updates/delete at the very end of the transaction. If so, how can a write conflict be recognized already from inside setVariable? I would expect the recognition not before flush. If the optimistic locking shall work accurately in a distributed application server environment the check for a conflict should regard the central state inside the DB. Maybe setVariable performs a "select-ahead" for early conflict detection? But even if so, then flush must re-do it again resulting in a potential OptimisticLockException at flush time. I guess we have to switch on the ibatis logs to understand it in more detail

That is a misunderstanding, probably due to an imprecise explanation of mine. In fact, the following happens:

1) #setVariable is executed (https://github.com/ThorbenLindhauer/camunda-engine-unittest/blob/serlaized-variable-modifications/src/test/java/org/camunda/bpm/unittest/SimpleTestCase.java#L63); this increments the variable entity's revision in the entity cache; no flush is performed
2) when the command returns, a command interceptor (https://github.com/camunda/camunda-bpm-platform/blob/master/engine/src/main/java/org/camunda/bpm/engine/impl/interceptor/CommandContextInterceptor.java) kicks in and flushes the changes present in the entity cache; the engine has a list of command interceptors that can perform cross-cutting logic before/after every command.

This means, calling #setVariable will not throw an OptimisticLockingException and is also why the try-catch block in the example must surround the entire command.


You stated, among other reasons the cache is used to be able to flush all inserts/updates/delete at the very end of the transaction

This was an imprecise formulation of mine which is not correct when using transaction integration. It should be

You stated, among other reasons the cache is used to be able to flush all inserts/updates/delete at the very end of the command

Cheers,
Thorben

Lutz Kasselmann

unread,
Aug 17, 2015, 7:16:45 AM8/17/15
to camunda BPM users
Hi Thorben,

it was not your explanation which was imprecise but my reading of your sample code, sorry!

Concerning the assumed read lock on getVariable you also were right: There was a misconfiguration in the underlying application server's data source resulting in the isolation level repeatable read instead of the desired read committed. So, the effect is clear.

All in all, looping over the command excution in case of an OptimisticLockingException is a perfect solution for our case! Thank you very much again for you quick and reasoned help!!!

The only remaining concerns we have may be considered as suggestions:

1)
The command pattern from the sample code seems not to be part of the public API. At least the need of downcasting to the implementation class ProcessEngineConfigurationImpl indicates this. This should probably be changed to provide more safeness for the API user.

2)
A quasi concurrent creation of the same variable (identified by the same name) lets our DB driver throw a duplicate key exception. Finally the workflow engine wraps this product specific exception (via an ibatis exception) into a generic ProcessEngineException. For my point of view this situation of competing creators is just a kind of write conflict and should also be indicated as OptimisticLockingException. Conceptually the same object is just transferred from the consistent state of non-existence to two conflicting first versions of existence.
(Note: In principle we could avoid creation conflicts by explicitly set the variable to an initial value just at start time of the process. Unfortunately we have to deal with process instances already running and hence need a solution which can cope with the non-existence of the variable.)

Cheers
Lutz
Reply all
Reply to author
Forward
0 new messages