mongodb ?

Alp Timurhan Çevik

unread,

Oct 28, 2014, 10:07:27 AM10/28/14

to camunda-...@googlegroups.com

Hello,

I currently am using activiti within a SAAS app.

Thinking about moving on to camunda, and I would like to learn if there is a work in progress related to persistence alternative of mongodb or similar non relational mechanism, as performance and other issues arise while using relational.

Any work in progress on this ?

Thanks in advance,

Alp

Andrii Stesin

unread,

Oct 29, 2014, 2:54:18 AM10/29/14

to camunda-...@googlegroups.com

As for me, I consider Neo4j stable 2.1.* branch to be the most viable alternative to replace relational db for persistence for BPMS. That's for plenty of reasons, minimizing ORM impedance mismatch being the most important among them.

WBR,

Andrii

Daniel Meyer

unread,

Oct 29, 2014, 4:55:33 AM10/29/14

to camunda-...@googlegroups.com

Hi Alp,

At the moment there are no plans to support Mongo DB. But there are more and more people asking for it. What would be your primary motivation for using Mongo DB with the process engine?

Cheers,

Daniel

PS:

Also note that with 7.2 we roll out a major refactoring of the persistence layer. The only class in the process engine which directly relates to SQL is org.camunda.bpm.engine.impl.db.sql.DbSqlSession

We also did some experiments with using Hazelcast for process engine persistence for kicks: https://github.com/menski/camunda-bpm-platform/tree/hazelcast/engine/src/main/java/org/camunda/bpm/engine/impl/db/hazelcast

Alp Timurhan Çevik

unread,

Oct 29, 2014, 5:56:27 AM10/29/14

to camunda-...@googlegroups.com

Hi Daniel,

Our application is an SAAS app, as I have said. Thus, there are several companies using the same server, running lots of processes each, and each company has different structured departments/companies beneath them, having lots of users belonging to different hierarchical levels.

The application data is already kept in mongo, in order to reply to statistics or some calculated metrics on several requests. The only data that is kept in relational is process data. Thus,

- The application data and process data are in different persistence solutions.

- The bottleneck right now is process based queries, which involves which process belongs to which company or companies' hierarchical levels, using process variables in activiti, which results in slow queries, such as 7 secs, comparing to mongo queries the same size having 0.1 seconds.

- The scalability issue is a concern, as the plan is utimately serve 100K+ companies, having maybe 2M+ employees.

Actually, thinking about the process data, main drawback of using a nonrelational database might be the aggregation related data, which might be the Business process monitoring in process management issue, the reports regarding the average completion time etc. Other than that, the processes and their data seems independent.

- Regarding business process, each task lives in the process, and they cannot live outside the process scope.

- and thinking about the form related data being a subset of a task, it really makes sense to keep the data within such a structure that task data owns form data, which could be reached through task and while querying task, as it need not be reached in scenarios excluding task.

combining these, a data structure that allows you to present all process related data in single query (with a cost of 0.001 secs ?) will return all the data, instead of traversing several different tables with several different joins. Also, one of our problems being keeping the data within the tasks, as the data hierarchy we need is so much detailed than simple lists and primary types, these also might be a lot easier to relate to using a nonrelational structure, presenting the users the possibility of constructing some object models, without the restrictions of relational data, a much better OO approach. I really do not think a relational approach is needed for process related data, as the main structure of the processes seems like hierarchy based, not relation based.

The 7.2 refactoring you have mentioned sounds really nice, and if you are modeling it as an interface, then other implementations would be possible to implement, hopefully.

Having said that, I would like to participate in any work regarding the transition to nonrelational data storage feature, as we would be doing this at some point anyways.

Cheers,

Alp

Alp Timurhan Çevik

unread,

Oct 29, 2014, 6:01:58 AM10/29/14

to camunda-...@googlegroups.com

Hi Andrii,

I agree, the relations of process definitions and process instances might be better represented in neo4j. In any case, and non relational solution seems like a better fit for process related data comparing to relational, in my opinion. So the possibility of choosing the persistence engine would be the major advance in that scenario.

I have not used neo4j yet, and each time I look at mongo based structures and what one needs to do to the OO structure to relate it to persistence layer regarding mongo, I am having wet dreams about a persistence layer that really does not intervene with the OO design, which might or might not be neo4j.

have you had any experience keeping a structure similar to process data in neo4j ? Any insights ?

Regards,

Alp

Daniel Meyer

unread,

Oct 29, 2014, 6:10:46 AM10/29/14

to camunda-...@googlegroups.com

Hi Alp,

> The 7.2 refactoring you have mentioned sounds really nice, and if you are modeling it as an interface, then other implementations would be possible to implement, hopefully.

The name of the interface is PersistenceSession

https://github.com/camunda/camunda-bpm-platform/blob/master/engine/src/main/java/org/camunda/bpm/engine/impl/db/PersistenceSession.java

Daniel

Alp Timurhan Çevik

unread,

Oct 29, 2014, 7:55:43 AM10/29/14

to camunda-...@googlegroups.com

https://github.com/mongodb/morphia

or

http://jongo.org/

could be used for implementation, creating persistence session related data based on these, I guess. jongo uses annotations from jackson, the same structure as presenting the rest services, so even the current structure might be enough for serialization issues. The objects already know how to serialize themselves.

So the issue left would be the implementation of operations, and some tweaks about when to present whole of the tree to the clients, in regard of rest service reply size. The query objects are already usable for mongo queries also, though might need some parametres to modify the depth of the object tree needed.

It might not take a lot of time using this approach, I guess. Any points/pitfalls I might be missing ?

--
You received this message because you are subscribed to a topic in the Google Groups "camunda BPM users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/camunda-bpm-users/hkEWX4e41r4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to camunda-bpm-us...@googlegroups.com.
To post to this group, send email to camunda-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/camunda-bpm-users/7668737f-c94e-4ff9-b4ec-80fbef311d5b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Andrii Stesin

unread,

Oct 30, 2014, 4:39:13 AM10/30/14

to camunda-...@googlegroups.com

Hi Alp,

Our application (now in development) has pretty much similar setup to what you described earlier, saying "several companies using the same server, running lots of processes each, and each company has different structured departments/companies beneath them, having lots of users belonging to different hierarchical levels". Our client is a group of companies, or rather a holding. And yes, we developed a modern, powerful and agile(!) data model of business data, completely on top of Neo4j - it turned out that it's relational equivalent would be monstrous and difficult to maintain (if possible at all).

And yes, the only data that is kept in relational is process data. Document data is kept in document-oriented store, and Neo4j with it's graph model and extreme speed and power turned to be optimal for representing business data just because the obvious fact: as soon as you have 10x more relationships than entities (which is especially true for business), relational technology fails.

So when saying that Neo4j is perfectly suited for storing process persistent data, I have strong and proven reason behind it.