question regarding synchronisation with version 2.2.25 on adding a new machine in a cluster

43 views
Skip to first unread message

Zeeshan Ahmad

unread,
Aug 14, 2017, 7:15:03 AM8/14/17
to OrientDB
Hey,

I had two running master nodes i had to add another master node. My distributed config:

{
  "replication": true,
  "hotAlignment" : false,
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": "majority",
  "executionMode": "synchronous",
  "readYourWrites": true,
  "newNodeStrategy": "dynamic",
  "servers": {
    "orientdbMaster1": "master",
    "orientdbMaster2": "master",
    "orientdbMaster3": "master"
  },
  "clusters": {
    "internal": {
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}


2017-08-14 10:44:44:254 WARNI [orientdbMaster1] Timeout (20001ms) on waiting for synchronous responses from nodes=[orientdbMaster2, orientdbMaster3] responsesSoFar=[orientdbMaster3] request=(id=1.263 task=gossip timestamp: 1502707464247 lockManagerServer: orientdbMaster1) [ODistributedDatabaseImpl]

As soon the new machine joined the cluster following chain of events happened:

1. Added orientdbMaster3
2. orientdbMaster3 started synchronising the database with orientdbMaster2
3. During this time orientdbMaster2 became unreachable for orientdbMaster1. Got this in the log continuously

WARNI [orientdbMaster1] Timeout (20001ms) on waiting for synchronous responses from nodes=[orientdbMaster2, orientdbMaster3] responsesSoFar=[orientdbMaster3] request=(id=1.263 task=gossip timestamp: 1502707464247 lockManagerServer: orientdbMaster1) [ODistributedDatabaseImpl]

4. Writes were not possible as the quorum of 2 was not reached. All the writes failed.
5. After the orientdbMaster3 was up, orientdbMaster2 started to rebuild the indexes. (Took a lot of time)


This caused a huge down time.

The same issues happens whenever a node which was the lock Manager was restarted. The machine starts to get the entire database.

Questions:

1. Why is the entire database needed to be fetched again on every restart of the lockManger node?
2. How is the new lock Manager elected in the beginning and what is the process of re-election?
3. Can i specify the new node to get the database from a specific node?
4. Why are writes not possible on the node which is helping re-sync of database?
5. Why the indices rebuild whenever there is re-sync?

I have waste a lot of time when i added a new machine and this caused a huge downtime as well.


Thanks,
Zeeshan

Luca Garulli

unread,
Aug 18, 2017, 6:46:37 PM8/18/17
to OrientDB
Hi Zeeshan,

Please try v2.2.26 where we fixed many of these problems. Please let me know.

Best Regards,

Luca Garulli
Founder & CEO

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages