I had two running master nodes i had to add another master node. My distributed config:
"hotAlignment" : false,
2017-08-14 10:44:44:254 WARNI [orientdbMaster1] Timeout (20001ms) on waiting for synchronous responses from nodes=[orientdbMaster2, orientdbMaster3] responsesSoFar=[orientdbMaster3] request=(id=1.263 task=gossip timestamp: 1502707464247 lockManagerServer: orientdbMaster1) [ODistributedDatabaseImpl]
As soon the new machine joined the cluster following chain of events happened:
1. Added orientdbMaster3
2. orientdbMaster3 started synchronising the database with orientdbMaster2
3. During this time orientdbMaster2 became unreachable for orientdbMaster1. Got this in the log continuously
WARNI [orientdbMaster1] Timeout (20001ms) on waiting for synchronous responses from nodes=[orientdbMaster2, orientdbMaster3] responsesSoFar=[orientdbMaster3] request=(id=1.263 task=gossip timestamp: 1502707464247 lockManagerServer: orientdbMaster1) [ODistributedDatabaseImpl]
4. Writes were not possible as the quorum of 2 was not reached. All the writes failed.
5. After the orientdbMaster3 was up, orientdbMaster2 started to rebuild the indexes. (Took a lot of time)
This caused a huge down time.
The same issues happens whenever a node which was the lock Manager was restarted. The machine starts to get the entire database.
1. Why is the entire database needed to be fetched again on every restart of the lockManger node?
2. How is the new lock Manager elected in the beginning and what is the process of re-election?
3. Can i specify the new node to get the database from a specific node?
4. Why are writes not possible on the node which is helping re-sync of database?
5. Why the indices rebuild whenever there is re-sync?
I have waste a lot of time when i added a new machine and this caused a huge downtime as well.