Cores and clusters and classes - oh my!

225 views
Skip to first unread message

scott molinari

unread,
Feb 25, 2016, 1:11:57 AM2/25/16
to OrientDB
Hey ODB Team,

With 2.2, the system now creates a number of clusters to match the cores available in the system per class created. Great! Though, the "old" way was to create a single cluster per class, which meant, theoretically, if we only had 1 class per cluster, we could have 32,676 classes. Now let's imagine we have ODB on a good sized system with 16 cores. Does that mean each class would have 16 clusters? If that is true, does that lower the number of classes available to just over 2,000? 

There was also mention in possibly increasing the number of clusters available per database. Has this change also been made in 2.2?

Scott

scott molinari

unread,
Feb 26, 2016, 7:00:08 AM2/26/16
to OrientDB
A polite bump. :-)

Scott

Luca Garulli

unread,
Feb 26, 2016, 12:06:12 PM2/26/16
to OrientDB
Hi Scott,
We will increase this limitation in v3.0, with the chance to have up to 2^16 clusters. If now you have this problem (even though having more than 2,000 classes is 1% of use cases) you can force OrientDB to have less clusters per class by executing:

ALTER DATABASE MINIMUMCLUSTERS 1

Or any number instead of 1.



Best Regards,

Luca Garulli
Founder & CEO


On 26 February 2016 at 13:00, 'scott molinari' via OrientDB <orient-...@googlegroups.com> wrote:
A polite bump. :-)

Scott

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

scott molinari

unread,
Feb 26, 2016, 2:25:25 PM2/26/16
to OrientDB
Thanks Luca. 

I like that you left a backdoor open for the configuration of cluster creation.

I agree that most users would never need 2000 classes, but we are thinking about using multi-tenancy within a single database for our demo and development environments. So, such a restriction could be an inhibitor to these plans. We wouldn't be needing serious performance for these environments, so the MINIMUMCLUSTERS attribute will come in handy. :-)

Scott     

Sfinx

unread,
Apr 24, 2016, 3:10:03 PM4/24/16
to OrientDB
Hi Luca,

It is still not clear can the different classes be located at the same cluster ? As of v2.1.16 seems like this is impossible :

....
create database plocal:/opt/orientdb/databases/db1 root root_pass;
create cluster clust1;
create class class1 extends V cluster clust1;

create class class2 extends V cluster clust1; <=== error

Error: com.orientechnologies.orient.core.exception.OSchemaException: Class
class2 already exists in current database

Grouping several classes in the same cluster is impossible, but we can remove all the clusters from the class !

orientdb {db=db1}> info class class2

CLASS 'class2'

Super classes........: [V]
Default cluster......: class2 (id=12)
Supported clusters...: class2(12) 
Cluster selection....: round-robin
Oversize.............: 0.0
orientdb {db=db1}> info class class2

orientdb {db=db1}> alter class class2 removecluster class2;

Class updated successfully

orientdb {db=db1}> info class class2

CLASS 'class2'
Super classes........: [V]
Default cluster......: null (id=-1)
Supported clusters...: 
Cluster selection....: round-robin
Oversize.............: 0.0
orientdb {db=db1}> 

Is this behavior expected ? If yes - can somebody explain the logic behind it ?

I'd like to have some classes located inside the same cluster. Is it possible ?

TIA

пятница, 26 февраля 2016 г., 19:06:12 UTC+2 пользователь l.garulli написал:

Hung Tran

unread,
Apr 25, 2016, 12:51:57 AM4/25/16
to OrientDB
Hi TIA,

You may be confused between node cluster (distributed) and class cluster (local).

My Best,
Hung Tran

Sfinx

unread,
Apr 25, 2016, 2:45:00 AM4/25/16
to OrientDB
Hi,

The docs says that "A cluster is a generic way to group records". Create class command definitely has the cluster definition. What I'm missing ? I think that "node" semantic was replaced by "cluster" at some time during development. It will be good to have the right definition of the OrientDB node that holds the records, its clusters and classes. Can you please give it ? Or point to some OrientDB docs about the classes/clusters/nodes and their clear relations ?

Thanks !

P.S. Seems like "ALTER CLASS class1 CLUSTERSELECTION local" sets the selection to round-robin (v2.1.16). Anyway I think balanced mode was created for sharding purposes, so why it is not possible to have the same cluster balanced over serveral nodes ?

понедельник, 25 апреля 2016 г., 7:51:57 UTC+3 пользователь Hung Tran написал:

scott molinari

unread,
Apr 25, 2016, 6:52:20 AM4/25/16
to orient-...@googlegroups.com
I believe this part of the docs explains clusters and nodes properly.

http://orientdb.com/docs/last/Distributed-Architecture.html

You can have nodes with different clusters of a class. But, you can't have different classes in the same cluster. In other words, clusters are groups of records that belong to a certain class only. So, you can't mix and match clusters to different classes. You can, however, mix and match clusters to different nodes. 

Scott

Sfinx

unread,
Apr 25, 2016, 7:11:23 AM4/25/16
to orient-...@googlegroups.com
I will be good to reflect such limitations in docs. Seems like one cluster per class per CPU have to be multiplied by nodes count too ;) 10 nodes X 8 CPU's allow to create 400 classes/clusters max, right ? May be 2^16 for 3.x is still small ?

Node - 

понедельник, 25 апреля 2016 г., 13:52:20 UTC+3 пользователь scott molinari написал:

scott molinari

unread,
Apr 25, 2016, 3:47:40 PM4/25/16
to OrientDB
Good point. I wonder if the cpu cores = # of clusters setting also counts in a distributed system. It wouldn't seem to make too much sense, because "over-chunking" the data can also cause performance issues when querying.

Scott

Luca Garulli

unread,
Apr 25, 2016, 7:24:23 PM4/25/16
to OrientDB
Hi guys,

When running distributed, the existent clusters are assigned to the servers, so no other clusters are created per servers, unless the number of servers is > than existent clusters. In releases <v2.2 one cluster per server was always created, so in the worst case scenario with 10 nodes and 8 cores, you could end up with 17 clusters per class. 

In v2.2 they would be just 10 per class.

Furthermore you could decide to have only 3 master servers and 100 replica only servers. Those servers don't create any additional clusters because they are read only.

I hope now it's more clear.


Best Regards,

Luca Garulli
Founder & CEO


On 25 April 2016 at 21:47, 'scott molinari' via OrientDB <orient-...@googlegroups.com> wrote:
Good point. I wonder if the cpu cores = # of clusters setting also counts in a distributed system. It wouldn't seem to make too much sense, because "over-chunking" the data can also cause performance issues when querying.

Scott

--

scott molinari

unread,
Apr 26, 2016, 12:40:03 PM4/26/16
to OrientDB
So, in distributed and with 2.2, the default is one cluster per node, per class on master nodes.

And on a single master and with 2.2, the default is one cluster per cpu core per class.

Is that correct??? What is the purpose of spreading out of a class with more clusters per cpu core in a single master instance?

Scott

Sfinx

unread,
Apr 26, 2016, 1:40:15 PM4/26/16
to orient-...@googlegroups.com
The more important thing here is that "cluster" semantic can't be used anymore for generic grouping as this stated in the docs but means the node/CPU "thing" now. The main disadvantage is that it is impossible to group several classes inside some cluster even in local (non-distributed) mode. I'd prefer more clear definitions where clusters is not related to nodes, storage or so - it is just grouping term. If you want to tie the cluster to nodes then something like the following command can be implemented :

CREATE CLUST clust1 at NODES all NODESELECTION round-robin

Next all needed classes can be tied to appropriately created cluster - it is much more clear than the current CPU's, nodes and default-distributed-db-config.json complex mess.

вторник, 26 апреля 2016 г., 19:40:03 UTC+3 пользователь scott molinari написал:

Luca Garulli

unread,
Apr 26, 2016, 8:08:08 PM4/26/16
to OrientDB
Hi Scott and Sfinx,

In v2.2 we create 1 cluster per core, so if you have 8 cores, they will be 8. Now, if you run distributed, in full-replica mode (no sharding) those 8 clusters will be distributed between those 8 nodes. You can define that the cluster "client_usa" will be sticked to the USA server, otherwise it will be OrientDB that will decide automatically who of these 8 nodes is the owner.

The owner is important just for creation of records (see the docs).

Starting from v2.1 we ended up to restrict the usage of clusters to put anything inside of it, but rather to rely each cluster to a class (actually, they can still be binary clusters with no class).

Sfinx, what the syntax above, what was your goal?

Best Regards,

Luca Garulli
Founder & CEO


On 26 April 2016 at 19:40, Sfinx <sfinx.s...@gmail.com> wrote:
The more important thing here is that "cluster" semantic can't be used anymore for generic grouping as this stated in the docs but means the node "thing" now. The main disadvantage is that it is impossible to group several classes inside some cluster even in local (non-distributed) mode. I'd prefer more clear definitions where clusters is not related to nodes, storage or so - it is just grouping term. It will be good to have the command like :

CREATE CLUST clust1 at NODES all NODESELECTION round-robin

Next all needed classes can be tied to appropriately created cluster - it is much more clear than the current CPU's, nodes and default-distributed-db-config.json complex mess.

вторник, 26 апреля 2016 г., 19:40:03 UTC+3 пользователь scott molinari написал:
So, in distributed and with 2.2, the default is one cluster per node, per class on master nodes.

And on a single master and with 2.2, the default is one cluster per cpu core per class.

Is that correct??? What is the purpose of spreading out of a class with more clusters per cpu core in a single master instance?

Scott

--

Sfinx

unread,
Apr 26, 2016, 9:54:10 PM4/26/16
to OrientDB
Hi Luca,

My proposal was aimed to have more clear database clustering levels that can be fully controlled by the database developer. It will be better to have the classes that can be arbitrary grouped by clusters but clusters further can be arbitrary grouped among nodes and their CPU's. For now I see that current approach has some limitations and mess :

- the clusters and their distribution became dependent from nodes and CPU's but not the database developer aims
- the different classes cant belong to the same cluster though they have the cluster CRUD part (?)
- the clusters selection is mystery with some predefined numbers
- the clusters numbers space leaks the classes number space, making database applications less flexible

BTW, what will happen if somebody will replace the 8 core CPU with say 16 cores (or 4) at the same node and restart the OrientDB ?

среда, 27 апреля 2016 г., 3:08:08 UTC+3 пользователь l.garulli написал:

odbuser

unread,
Apr 27, 2016, 1:57:00 AM4/27/16
to OrientDB


On Tuesday, April 26, 2016 at 9:54:10 PM UTC-4, Sfinx wrote
BTW, what will happen if somebody will replace the 8 core CPU with say 16 cores (or 4) at the same node and restart the OrientDB ?

I have the same question... didn't think about it until this thread but I have environments that fluctuate regularly so this scenario is highly likely.  I assume one answer to this is that it won't be optimized and that the database should be exported and imported with the optimal cluster configuration and I'd agree if that's possible.  In my case it wouldn't be so.

scott molinari

unread,
Apr 27, 2016, 4:52:58 AM4/27/16
to OrientDB
BTW, what will happen if somebody will replace the 8 core CPU with say 16 cores (or 4) at the same node and restart the OrientDB ?

I would bet the cluster creation is only done at class creation. Once the clusters are made, it wouldn't matter what the hardware changes to.

As I see it, the whole idea of the additional cluster creation is to help parallelism i.e. concurrent execution and lesser about data locality control, whereas, it looks like the data locality can always be done later in a distributed setup too. So, I see no big deal.

Sfinx, you keep mentioning clustering of classes, but I think the term "cluster" in ODB terms is slightly different. To me, from a sharding system standpoint, the ODB "cluster" should actually be called a "chunk". Where, all the chunks of a class make up the whole cluster for that class. I guess "cluster" is sexier than "chunk". LOL!

Scott   


Sfinx

unread,
Apr 27, 2016, 5:07:42 AM4/27/16
to OrientDB
I do not bother how the data grouping is named but I definitely seeing problem trying to select the data that is spreaded over the 10 chunks/clusters with autocreated by ODB names - just try to use SELECT from CLUSTER:<cluster-name>.

среда, 27 апреля 2016 г., 11:52:58 UTC+3 пользователь scott molinari написал:

scott molinari

unread,
Apr 27, 2016, 9:49:09 AM4/27/16
to OrientDB
You can rename the clusters at any time. 

http://orientdb.com/docs/last/SQL-Alter-Cluster.html

Scott

Sfinx

unread,
Apr 27, 2016, 10:06:43 PM4/27/16
to OrientDB
IMHO it is much easier to use the right architected database.

среда, 27 апреля 2016 г., 16:49:09 UTC+3 пользователь scott molinari написал:

scott molinari

unread,
Apr 28, 2016, 8:38:20 AM4/28/16
to OrientDB
I don't understand your worries. You can introspect on the clusters and store data where you want it. You can rename clusters too. Or, you can let ODB store the class records in the clusters (the chunks) where it thinks best or according to your preference, i.e. round-robin, balanced, default, etc.

Scott
Reply all
Reply to author
Forward
0 new messages