Hi Daniel,
Our application is an SAAS app, as I have said. Thus, there are several companies using the same server, running lots of processes each, and each company has different structured departments/companies beneath them, having lots of users belonging to different hierarchical levels.
The application data is already kept in mongo, in order to reply to statistics or some calculated metrics on several requests. The only data that is kept in relational is process data. Thus,
- The application data and process data are in different persistence solutions.
- The bottleneck right now is process based queries, which involves which process belongs to which company or companies' hierarchical levels, using process variables in activiti, which results in slow queries, such as 7 secs, comparing to mongo queries the same size having 0.1 seconds.
- The scalability issue is a concern, as the plan is utimately serve 100K+ companies, having maybe 2M+ employees.
Actually, thinking about the process data, main drawback of using a nonrelational database might be the aggregation related data, which might be the Business process monitoring in process management issue, the reports regarding the average completion time etc. Other than that, the processes and their data seems independent.
- Regarding business process, each task lives in the process, and they cannot live outside the process scope.
- and thinking about the form related data being a subset of a task, it really makes sense to keep the data within such a structure that task data owns form data, which could be reached through task and while querying task, as it need not be reached in scenarios excluding task.
combining these, a data structure that allows you to present all process related data in single query (with a cost of 0.001 secs ?) will return all the data, instead of traversing several different tables with several different joins. Also, one of our problems being keeping the data within the tasks, as the data hierarchy we need is so much detailed than simple lists and primary types, these also might be a lot easier to relate to using a nonrelational structure, presenting the users the possibility of constructing some object models, without the restrictions of relational data, a much better OO approach. I really do not think a relational approach is needed for process related data, as the main structure of the processes seems like hierarchy based, not relation based.
The 7.2 refactoring you have mentioned sounds really nice, and if you are modeling it as an interface, then other implementations would be possible to implement, hopefully.
Having said that, I would like to participate in any work regarding the transition to nonrelational data storage feature, as we would be doing this at some point anyways.
Cheers,
Alp