Best way to host many orient databases

Michael MacFadden

unread,

Oct 14, 2015, 5:43:01 PM10/14/15

to OrientDB

Hello,

For our architecture we are contemplating something like multi-tenancy. In our approach each tenant would get their own database. When I say database, I don't mean server. I mean a database within an OrientDB server.

The question is... Is there a best practice way to do this. The three options we see are:

1) Stand up an entire OrientDB server to host a single database.

This seems inefficient. Especially since we are going to look towards a clustered / replicated architecture.

2) Put multiple databases into a single OrientDB Server

Here I am curious as to scalability. Is there a practical limit to how many databases a single OrientDB cluster can hold? Each tenant may make many connections to the database. If say each tenant makes 20 or so database connections and we have 1,000 tenants, I now have 20,000 connections going to the database. Obviously we would have many servers supporting this load so that would be distributed.

3) Some middle ground where we have a certain number of tenants hosted in each clustered instance of OrientDB

Not sure how to draw the line here.

Just wondering if there are best practices around this? Thanks and keep up the good work.

Luigi Dell'Aquila

unread,

Oct 16, 2015, 3:43:06 AM10/16/15

to orient-...@googlegroups.com

Hi Michael,

there is not general rule, just because it depends a lot on how many requests per second will every db have, how many data it will contain, how complex your queries are and so on.

As a rule of thumb, I would not go over a few tens of (small) databases per instance, just because CPU, RAM, IO and disk are limited resources and they have to be shared.

Thanks

Luigi

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

scott molinari

unread,

Oct 17, 2015, 4:59:41 AM10/17/15

to OrientDB

Would OrientDB also support graph partitioning in some way, like TitanDB offers, through Tinkerpop's Blueprint PartitionGraph?

https://github.com/tinkerpop/blueprints/wiki/Partition-Implementation

Scott

Jan Plaček

unread,

Oct 18, 2015, 5:55:00 AM10/18/15

to OrientDB

Doesn't it already in a way of classes and clusters?

Dne sobota 17. října 2015 10:59:41 UTC+2 scott molinari napsal(a):

scott molinari

unread,

Oct 18, 2015, 7:13:53 AM10/18/15

to OrientDB

With clusters, maybe. When I think about it, you might be right. I am just not sure how to create a class and a predefined cluster (for the customer) at the same time. That would need to be possible, to do multi-tenancy.

There is also a limit on the number of clusters, so that also means a limit on the number of customers possible in one database. I'd also think clustering at customer level should be possible too, so they can also take advantage of the distribution of data. So, say each customer could have 100 clusters available to them. That would allow 320 customers on a single database. And let's say, one ODB server allows 50 databases only (keeping with Luigi's suggestion), that would be 1600 customers on one ODB instance. That could work, but for very small customers. Then we'd need to have a scale up plan. Thus, why I'd like to keep it to one db per customer and avoid the complication.

I don't think classes would help partition data for multi-tenant purposes. I'd also like to leave that feature to tenants, so each customer can work with their own classes.

One thing also open in my mind is if there is a limit on the number of classes available. Is this congruent to the number of clusters available? There is no mention of the number of classes available in a database on the limits page.

http://orientdb.com/docs/last/Limits.html

Scott

scott molinari

unread,

Oct 18, 2015, 9:21:03 AM10/18/15

to OrientDB

I also just found this in the documentation about graph partitioning.

http://orientdb.com/docs/last/Partitioned-Graphs.html

Scott

Jan Plaček

unread,

Oct 18, 2015, 9:22:59 AM10/18/15

to OrientDB

I didn't say it should be used as a solution for multitenancy.

I am just saying it can be used the same way as Tinkerpop's PartitionGraph.

The thing is, one shouldn't partition any GRAPH database, because it's basically impossible from the nature of graph (well it's possible but one woudn't really benefit from that).

We use graph to represent very coherent data with complicated relations.

We assume that all data in graph are (or can be at some point) related and can be effectively queried based on those relations.

This assumption makes any partitioning problematic, because we can't predict how many partition will need to be involved in query (potentially all of them).

Sure we can identify some subgraphs with higher cohesion than other parts of graph, but any time we would place that subgraph in separate partition,

we would make queries involving more of those subgraphs a lot less effective.

Adding parititon could enchance performance in one place, but it will horribly affect performance in other places.

That might be bareble to some extend and for those situations there are clusters/PartitionGraph.

It might happend that adding a partition, actually woudn't hurt a performance in any place, but that would imply that our data are incoherent and it's pointless to place those data in the same graph or use graph db whatsoever.

Dne neděle 18. října 2015 13:13:53 UTC+2 scott molinari napsal(a):

scott molinari

unread,

Oct 18, 2015, 9:39:33 AM10/18/15

to orient-...@googlegroups.com

Well, the whole idea of partitioning is to achieve data separation. So clearly, if you want a graph to be homogenous, it can't be partitioned.

Reading the ODB partitioning section, partitioning through the user permissions system sounds like a pretty good answer to the problem. In fact, for our "user service", I've been wondering how we can partition the user data, but still have all users available for global (above the tenant level) querying purposes, as we want to also have a user "network" across all tenants (which is a tricky situation, I know). The user role system sounds like a great fit for this.

Scott

Jan Plaček

unread,

Oct 18, 2015, 11:25:45 AM10/18/15

to OrientDB

"If you want a graph to be homogenous, ..."

It's not about wanting graph to be homogenous, you need a graph, because the data are homogenous.

"Idea of partitioning is to achieve data separation"

If you mean partitioning as a logical differentiation than yes. Than classes and inheritance can be used for that that in OrientDB.

I was talking about partitioning as a way to deal with technical limitations (size/performance) by separating data into smaller pieces.

Dne neděle 18. října 2015 15:39:33 UTC+2 scott molinari napsal(a):

scott molinari

unread,

Oct 20, 2015, 12:10:56 AM10/20/15

to OrientDB

In a multi-tenant system, which this thread is about, the data between tenants must be partitioned. That is the kind of partitioning we are talking about. Not a logical differentiation, but rather, physical separation.

Scott

Message has been deleted

Jan Plaček

unread,

Oct 20, 2015, 10:21:37 AM10/20/15

to orient-...@googlegroups.com

Being a multi-tenant system does not say anything about how much and which data needs to be separated, that depends on specific needs.

Does the data need to be on different files, disks, servers, buildings?

Why? Because of querying logic, security, maintability, preformance, scalability reasons?

We can discuss how to fullfill these needs while using OrientDB, but we would have to be specific about the needs:

"I need to serve specific data to specific tenant"

"I need tenants being able to query their data, without begin able to query data of others"

"I need to be able to query data of all tenants, while tenants should be able to query only theirs data"

"I need fine grained security control applyable to parts of the graph or subgraph"

"I need to place data on different machines, because of security reasons, can I still query them as a whole?"

"Will I benefit, performance wise, from placing parts of graph/subgraph on different machine?"

"Can I maintain (backup, log, recover, ...) parts of the single graph?"

....

These are different concerns all applicable to multi-tenant systems, achievable by different means and often placing limits on one another.

Dne úterý 20. října 2015 6:10:56 UTC+2 scott molinari napsal(a):

scott molinari

unread,

Oct 20, 2015, 11:14:31 AM10/20/15

to orient-...@googlegroups.com

Tenant data separation is a standard and huge main security concern, when dealing with cloud computing infrastructure and that same concerns has to be covered by the database as well. It isn't a specific need, it is a necessity.

Scott

CNM

unread,

Jul 17, 2016, 12:57:16 AM7/17/16

to OrientDB

I have an extension to this question. I have the need for seperation of data not so much because of multi-tenancy and security but because the data is logically separable and would have a huge impact on performance if I can control the volume of data the queries have to refer to. Currently I have three solutions all with problems.

I can use separate databases as that's the best way to visualise the logical "partitions" I'm talking about but I have thousands of "projects" that need to be partitioned which would create scalability issues within the same server.

Then the second is to have separate sets of classes for each partition with some naming convention for tagging the classes against each partition. Both of these solutions will add overhead in terms of managing schema and I'm keen to find out how many classes OrientDB can support in a single DB.

The third option of using a cluster within each class for each project seems the most natural but fails since we cannot have cluster level indexes in OrientDB. Will the performance benefits of using clusters be lost by not having cluster level indexes? Any advice will be highly appreciated. This is a very common scenario in cloud based and/or high-scalable systems design. Ideally I need a light weight partitioning/clustering mechanism that does not require you to make compromises.

Reply all

Reply to author

Forward