About HA cluster sync data

12 views
Skip to first unread message

Liping Huang

unread,
Jan 30, 2018, 11:01:48 PM1/30/18
to Neo4j
Hi there,

Given I have a HA cluster with 3 servers: server1 - Master server2 - Slave1 server3 - Slave2, as I want to keep all data up to date, and LOAD CSV approach is too slow for my hurge data,

So I want to load reload all data in offline with neo4j-admin import functionality.

Let's say I shutdown server1. ( so server2 will become the new Master server )

then I load the new copy of data with neo4j-admin import functionality in server1

then I start server1, as server2 already the Master in the HA cluster, so server1 will join as the slaver

so how about the sync behavior?
1. sync the new data from server1[Salver]  to server2[Master] and server3[Slaver]
2. sync the data from server2[Master] to server1[Slaver]
3. or I must delete the databases in server2, server3, and restart all the servers, so server1 will become Master, server2 server3 will act as the Slaver and sync all the data from Master

please, thanks.

Michael Hunger

unread,
Jan 31, 2018, 7:08:36 AM1/31/18
to ne...@googlegroups.com
It doesn't work like 1 or 2.

You have to do (3).

You have to completely clean out the other servers and have them copy from master.

Or you seed the whole cluster with the data from neo4j-admin import, i.e. copy the database from master onto the other two.
Also make sure to use neo4j-shell on that data to create the necessary indexes / cosntraints before using it to seed your cluster.

How big is your data? And what does your LOAD CSV look like.

Michael
> --
> You received this message because you are subscribed to the Google Groups "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Liping Huang

unread,
Jan 31, 2018, 8:47:29 PM1/31/18
to Neo4j
Thanks MH,

This is the basis for my dataset, which load it via neo4j-admin import, it take almost < 20 mins but take ~10 hours to using LOAD CSV

ID Allocation
Node ID143812726
Property ID590958500
Relationship ID244292244
Relationship Type ID3

My LOAD CSV looks like that:
1. Create the CONSTRAINT and INDEX at the begin
2. Create the nodes
USING PERIODIC COMMIT 50000
LOAD CSV FROM 'file:///XXXX.csv' AS row
MERGE (c:Company { id: row[0] })
ON CREATE SET
    c.name = row[1],
    ......
ON MATCH SET
    c.name = row[1],
    ......
   
USING PERIODIC COMMIT 50000
LOAD CSV FROM 'file:///YYYY.csv' AS row
MERGE (p:Employ { id: row[0] })
ON CREATE SET
    p.name = row[1],
    ......
ON MATCH SET
    p.name = row[1],
    ......

3. Create the relations
USING PERIODIC COMMIT 50000
LOAD CSV FROM 'file:///RRRR.csv' AS row
MATCH (c:Company { id: row[0] })
MATCH (p:Employ { id: row[1] })
MERGE (c)-[r:EMPLOY]->(p)
ON CREATE SET
      r.since=row[2],
      ......
ON MATCH SET
      r.since=row[2],
      ......


在 2018年1月31日星期三 UTC+8下午8:08:36,Michael Hunger写道:
Reply all
Reply to author
Forward
0 new messages