Hi there,
Can someone share some good suggestion on how to import/update huge data in neo4j, currently I need import almostly 200,000,000+ nodes and 200,000,000+ relationships data into neo4j, and need keep those data update to date, and due to the business requirements, I cannot do it in offline mode, so seems the LOAD CSV or Cypher CREATE/MERGE query is the only choose, but for PoC project, I get the performance
issueIn a 40 cores CPU, 30GB mem and SSD disk test server, the write performance is poor, it only get ~1000 nodes/s when using CREATE/MERGE cypher query and ~5000 nodes/s when using LOAD CSV, and another big issue is only I try to update the nodes/relationships, neo4j server become
unavailable.
Here is the sysinfo from Neo4j:
Here is the heap size and pagecache:
dbms.memory.heap.initial_size=12g
dbms.memory.heap.max_size=12g
dbms.memory.pagecache.size=14g
Here is the sample cypher:
CREATE CONSTRAINT ON (c:Company) ASSERT c.id IS UNIQUE;
CREATE CONSTRAINT ON (p:Person) ASSERT p.id IS UNIQUE;
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///persop-{number}.csv' AS row
MERGE (person:Person { id: row[0] })
ON CREATE SET
......
ON MATCH SET
......
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///company-{number}.csv' AS row
MERGE (company:Company { id: row[0] })
ON CREATE SET
......
ON MATCH SET
......
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///person-legal-company-{number}.csv' AS row
MATCH (c:Person { id: row[0] })
MATCH (p:Company { id: row[1] })
MERGE (c)-[r:REL]->(p)
ON CREATE SET
......
ON MATCH SET
......
Waiting for your help, thanks in advance.
Rgards.