Basic BPM Threading and Clustering Questions

Gareth Metcalf

unread,

Jul 24, 2014, 4:14:12 AM7/24/14

to camunda-...@googlegroups.com

Hi All,

I've spent about a week investigating Camunda, a mixture of reading, coding and setup. Im able to deploy dynamically generated processes and so far it looks great. However I still have few critical unanswered questions and wondered if someone could help ?

1) Threading: It looks like the BPM Engine Threads ( having executed the Camunda-qa-performance tests on a single instance ). If it does thread, how can I adjust the number of threads in order to scale the single BPM instance to the hardware it is deployed on ? e.g. Dual core to Oct core will be capable of handling different loads. ( I can only see threads for the job executor on JEE/JBoss deployments )

2) Clustering: I have three tomcat instances, with the Camunda WAR deployed in each, pointing at the same postgresql database ( each on different ports ). The only difference is each application.context has a different processEngineName defined ( Engine0, Engine1, Engine2).

........

</bean>

When I access Cockpit ( or Tasklist ) on the first instance I cannot see the other engines ? How do I know they are running ? Are there any plugin's or API calls I can make to identify which engines are active ? Do they register themselves in the DB ?

( I tried creating standalone Java engines to start with but believe this wont work ( https://groups.google.com/forum/#!searchin/camunda-bpm-users/Cluster/camunda-bpm-users/2WDI6oYJjeg/ZYOTpLqp0cAJ ) so changed to Tomcat - but still no luck ). Do I need to use a different Engine Class maybe ?

I've searched the forum and read the docs, but must be missing something. Any help appreciated.

Gareth

webcyberrob

unread,

Jul 24, 2014, 5:12:18 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

A few quick comments based on my understanding and I shall make them in the context of a Tomcat deployment...

With regard to threads, there are 'client' threads and JobExecutor threads. Hence if a client instantiates a process instance, the client thread will control execution up to either a wait state od the end of the process. A wait state could be explicit as in the case of a user task or receive task, or an item marked as an asynch continuation. The client thread in this case would actually be a serverside thread from the tomcat HTTP connector pool - hence my use of client thread in inverted commas. Hence if you configure 100 HTTP connector threads, you could have 100 simultaneous process instances fragments running.

In the case of an asynch continuation, the jobexecutor will claim the job and keep running the process in its own threads up to the next wait state. By default, the job executor can have up to 10 threads in its thread pool.

Hence there are two thread contexts here, client and job executor, however on Tomcat, all the threads will be serverside threads.

With regard to your 'clustered deployment', its more akin to three separate instances which happen to share a common database. Hence if you login to one, it may have no knowledge of the others. You may need to differentiate between multi-tenancy and a cluster. If all three nodes had the same config including engine name, then you have a more conventional cluster. The engine is essentially stateless and thus any node can handle any request and the jobexecutor instances will work with each other. You could declare three distinc logical engine names across all three nodes in the cluster. In this configuration you have a kine of multi-tenancy, but be aware that this is to a common database, so you really haven't achieved true multi-tenancy yet.

Perhaps describe your use case and what you want to achieve?

regards

Rob

Message has been deleted

Gareth Metcalf

unread,

Jul 24, 2014, 5:49:51 AM7/24/14

to camunda-...@googlegroups.com

Hi Rob,

Thanks for the quick reply.

1) Threads: So Camunda Engine Threads ( client ) are controlled by the number of http connections ? This seems a little strange and not very flexible. e.g I may have a web application with an embedded process engine. I may want to accept 100 web users onto the application, but know the BPM engine will swamp the machine if more than 8 processes are executed at once. But if thats how it works, thanks for the explanation.

2) Clustering: I would like a High Performance, High Availability Cluster. So if I add more instances, I get more compute power ( a homogeneous cluster ). I don't mind which instance performs the work. Im not looking for multi-tenancy, just one homogeneous engine.

The Use Story would be:

As a BPM Owner,

I would like to scale the BPM engine horizontally,

So that It can cope with increased load.

I have renamed all of the instances to have the same "default' engine name, however I still only see one instance in Cockpit ( i.e. no drop-down as shown in the documentation ). Do I need to change anything else ?

Kind Regards

Gareth

thorben....@camunda.com

unread,

Jul 24, 2014, 6:10:37 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

1) What Rob explained is correct and I understand your concern. As Rob said, the job executor uses a configurable number of background threads. To keep control over the "heavy work" tasks, you can use asynchronous continuations in your processes (e.g. a task can be declared asynchronous). Then a job for this task is created which is eventually processed by the job executor. As an example, say your HTTP thread invokes the creation of a new process instance. If you declare the start event of the process as asynchronous, then all what happens is that a new process instance and a job are persisted to the database and the HTTP thread returns. The "heavy tasks" can then be run within the thread pool of the job executor.

2) As Rob said, the process engine java objects are stateless and state is kept in the database. What cockpit does is that it allows you to inspect process state as it is in the database. If you have three Tomcat instances with three process engines each pointing to the same database, all included cockpits display the same data, regardless of the process engine name. For clarity, you could give the engines on different deployment the same names.
If you use different schemas for different engines and want to inspect these in Cockpit, then you have to configure every engine withing the bpm platform configuration that cockpit can access.

Best regards,
Thorben

webcyberrob

unread,

Jul 24, 2014, 6:34:23 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

a few more quick comments;

The point of the thread model is you are in control. The engine effectively 'borrows' the client thread for synchronous task execution. You can control the number of client threads via the mechanism you use to allow clients to connetc. In the case of Tomcat, its usually via the HTTP connector.

You are also in control of the number of background or job executor threads. In addition as a process designer, you have a degree of control over how much of the process is executed in a client thread. Hence I would interpret this as you have unlimited flexibility.rather than any lack of flexibility...

You can scale out horizontally, and with two or more nodes, you get intrinsic high availability as the stateless nature of the nodes means that any surviving node can continue from wherever the process state was last persisted. Note that the implication of this is that tasks will generally be guaranteed to be executed at least once, ie if there is a failure midway through task execution, the task may be rerun.

In addition, in my experience Ive tended to max out the database tier long before the BPM engine nodes. Hence make sure you have a fast database! Also look at the history level, the more history you store, the busier the database.

(BTW - recently Ive been using the RDS service in AWS, hence for a few cents you can really hammer a database with a couple of camunda nodes just to see how it performs...)

regards

Rob

Gareth Metcalf

unread,

Jul 24, 2014, 7:32:00 AM7/24/14

to camunda-...@googlegroups.com

Thanks Thorben and Rob,

Utilising Async processes will work nicely ( thanks for that tip ).

As for the Cluster - I guess there is no admin UI - They are just individual instances, sharing the load ( via the DB ). I interpreted the documentation to mean that multiple instances would be displayed in the Cockpit application - Is this only for 'heterogeneous' clusters ?

So yet more questions:

1) Is there anyway to see which instance executed the process ? ( In the DB or via the API if there is no way in the UI )

2) Why would I name all the engines the same name ? ( I know they are acting as one engine, but cylinder1, cylinder2 in a car helps with diagnosis, why wouldn't it be the same here ? )

3) The final point made by Rob is also of concern at the moment. Running the QA tests, I saw little improvement past 5 threads ( 2014, Mac Pro, SSD ) when running with an in memory database. This is expected as CPU was around 95% - Great imho. However testing with postgresql it knocked performance to around 25% of H2, yet was still restricted to little gain after 4 or 5 threads. Given the real DB ( as opposed to in Memory ) overhead will mainly be I/O i'm surprised it didn't scale better ( CPU was around 45% ). Any ideas/experience why ? Does anyone have any experience with different DB's, which perform better with the load provided by Camunda ? ( I used Oracle RAC in the past as wasn't very impressed, so i'm not expecting much from Postgresql clustering tbh ).

Would it be better to deploy smaller separate clusters ( ie 2xEngine + DB ), and then load balance the process requests ? - Id rather not do this - but having it as an option would suffice for now.

( I assume AWS is Amazone Web Service ? )

Thanks again,

Gareth

webcyberrob

unread,

Jul 24, 2014, 8:18:23 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

Some comments re your questions...

When using multi-tenancy, cockpit is able to swap between tenants, but this is swapping logical engines within a cluster, not physical nodes across the cluster.

Q1 - Keep in mind that there may not be a 1 to 1 affinity between a node and a process instance. Consider a process with two sequential asynchronous continuations. The fist continuation may be executed by the job executor in node two, the next part of the process executed by the job executor in node one. Hence there may not be affinity (You could force affinity with a heterogenious deployment, but then you don't have a cluster...)

Q2 - I would recommend thinking about a different view taxonomy. In your Tomcat cluster, you may have many tomcat nodes, however conceptually you have one stateless engine distributed across the cluster nodes. If you use multi-tenancy, you are creating multiple logical process engines, again distributed across multiple cluster nodes.

Q3 - the default deployment sets history level to all. This kills the DB under load. Change the history level to activity or none and you will see considerable improvement. Then there is DB specific tuning...

In practice, the BPM engine would not typically be your bottleneck. The BPM engine should be orchestrating external services and tasks rather than performing any heavy lifting. Hence the BPM orchestrations are usually much lighter that the task processes and thus the limiting factor will be where the tasks are actually executed...

Performance of a BPM engine is a little different to what you may be familiar with, eg web server. In terms of performance, you really want to be clear on what your performance requirements are. For example are you more concerned with response time to instantiate a process instance or process throughput? For example, if process instantiate time is your main objective, then make the start of the process asynchronous. Hence as soon as the process instance is created, the client thread returns. I can easily load a dual CPU amazon instance with sustained 90tps and a response time in the 100ms range. The consequence of this is I sacrifice process throughput as the processes are executed entirely in the background jobexecutor and this adds DB overhead. Alternately, if I want more throughput, and Im prepared to sacrifice response time, the engine could borrow the client thread for longer and thus perform more processing in the client thread contest with lower DB overhead. An alternate metric is process liveness, but thats a much larger discussion...

The great thing about option 1, is if you have a usage pattern where you get massive load during the day, but a trough at night, then the engine can absorb huge numbers of process instantiations during the day with little impact on user perfromance, buffer them and catchup with processing during the slack times...

And yes - AWS is Amazon Web Services, with RDS I can have a high performance database setup, with synchronous data replication across availability zones and backups, all ready to go in about 15 minutes...

thorben....@camunda.com

unread,

Jul 24, 2014, 8:21:50 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

regarding your questions:

1) There is no built-in way to monitor this. Furthermore, a process instance is not necessarily tied to a single instance. For example if a process with a catching message event is started on instance A and the message received on instance B, then instance B would continue process execution. The same holds for the execution of jobs in general, although the instance that creates a job typically also executes it. You could solve this yourself by persisting these kind of things to some database (could even be a process variable). As cockpit has a plugin concept, you can still visualize such external data.

2) For process execution, the engine name is only relevant on the instance it is declared on and ultimately is a reference to a configuration (e.g. database). Assuming that three engines on three different Tomcat instances are configured in the same way, you could also give them the same name. This can be useful when you develop process applications in which you declare to which engine it should be deployed. In this case, you could deploy the process application on any of the Tomcat instances without changing deployment descriptors. If you wanted to track your Tomcat instances individually, I'd suggest to separate these identifiers from the engine identifiers and store them separately (see 1 ;) ). Ultimately it is up to you for what you use engine names.

3) I find it hard to comment on this one as I do not completely understand your setting. What do you refer to by 5 threads here? 5 times the QA testsuite in parallel? As an alternative for performance testing, I'd suggest you take one of the pre-built distros, develop a simple process that suits your needs and start a large number of instances over time. Daniel did some performance tests some time ago which could be an interesting read for you: http://blog.camunda.org/2014/01/benchmarking-camunda-process-engine.html

Cheers,
Thorben

Gareth Metcalf

unread,

Jul 24, 2014, 9:10:20 AM7/24/14

to camunda-...@googlegroups.com

More Great answers thanks for your time.

Engine Names:

Engine names being the same makes sense for deployment - That saves me some time working that one out - Thanks :) - However does that mean a deployment is logged against an engine name ? I figured it was just an entry in the DB, and any engine looking at that DB would have access to the job. That brings me onto more questions about how to deploy service task implementations ( JavaDelegates in a cluster ) - i guess addClasspathResource, but ill do some more reading first as that may already be well documented and I don't want to waste your time.

Clustering:

Im not worried about 'Engine Affinity' for processes or tasks - I want them to share the load. The reason for wanting to know which engine a process ( or task ) was executed on is for reporting. e.g. If one instance is running slow, or typically generates errors. It would be tough to identify e.g. 10 instances, 10000 jobs, 1000 fails - as one instance had an issue. This would be hard to identify without knowing all jobs were executed on the same instance. Looks like a custom property is required.

Performance:

History is a strange one - According to the article linked by Thorben ( thanks - thats what made me look at the QA test suit - its awesome to see btw ). The results show improvement with History on Full ! - Ill do some local testing to validate.

Im interested in Robs comment that Async processes will be less performant as they are executed by the Job Executor and therefor require more DB interaction. Is this due to the Job Executor updating after every transition and the Synchronous jobs don't ? ( If so that a little strange with regards to restarting a process ). @Rob - Can you educate me as to why this is so ( as I currently plan to run everything this way ). I was hoping for a potential tps in the 1000's so 90 sounds very low :(

@Thorben - I was referring to running 5 threads with the QA set. I specified 16 for my tests, expecting a plateau around 8 threads ( Quad core with hyper threading ... ). Ill do some more testing now I better understand the Clustering setup.

Kind Regards

Gareth

webcyberrob

unread,

Jul 24, 2014, 9:32:32 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

The extra DB overhead of the job executor is, in addition to the DB activity which would normally occur during execution, the job executor must also;

Select a set of candidate jobs to perform

Update the jobs in the DB to claim ownership of the jobs

Perform the jobs...

Delete the jobs

Hence there's a select, an update and a delete on the job table in addition to history etc. I wouldn't get too hung up on this those, the granularity can be tuned to suite your particular use case...

BTW - the reference to 90tps was based on a simple test I was performing in AWS with a single dual virtual cpu node. The limiting factor was I couldn't drive the client any harder. However at 90tps, in my domain this would support 2 million customer transactions per day...

thorben....@camunda.com

unread,

Jul 24, 2014, 11:29:28 AM7/24/14

to camunda-...@googlegroups.com

Hi Gareth,

regarding your questions:

Engine Names:
You are referring to programmatic deployment (addClasspathResource(..)) which is the way when using a plain, embedded engine. When a shared process engine ("shared" in the sense that the engine is a shared library and multiple applications use the same engine on the same server), you can use process applications for deployment and add deployment descriptors to your war file that save the need to deploy programmatically. See this section of the docs: http://docs.camunda.org/latest/guides/user-guide/#process-applications

Clustering:
Your reasoning is correct. I am not really familiar with monitoring facilities for servlet containers and application servers, but I think it is better to not have this as part of the core bpm platform and rather integrate existing solutions.

Performance:
Note that in the colorful bar charts in the article the unit of measure is instances per second; so more is better ;)

Best regards,

Thorben

Am Donnerstag, 24. Juli 2014 15:10:20 UTC+2 schrieb Gareth Metcalf:

thorben....@camunda.com

unread,

Jul 24, 2014, 11:33:54 AM7/24/14

to camunda-...@googlegroups.com

Sorry, I forgot my main point regarding your question with Java Delegates: These are not deployed to the engine (i.e. are not stored in the database). Instead, the classes have to be available to the engine instance that executes the process. If you have a .war deployment, the platform's job executor will look up the class from the .war classpath if you made a process application deployment.

Message has been deleted

Gareth Metcalf

unread,

Jul 25, 2014, 6:11:35 AM7/25/14

to camunda-...@googlegroups.com

@Rob - Thanks for the job executor explanation - It looks like you can control the number of jobs each engine collects at a time, as such it should limit the impact ( nice to have feature )

@Both - Thanks for all your help... time for some more experimentation ....

Reply all

Reply to author

Forward