Best practise for huge data import in neo4j

447 views
Skip to first unread message

Liping Huang

unread,
Nov 21, 2017, 5:39:32 AM11/21/17
to Neo4j
Hi there,

Can someone share some good suggestion on how to import/update huge data in neo4j, currently I need import almostly 200,000,000+ nodes and 200,000,000+ relationships data into neo4j, and need keep those data update to date, and due to the business requirements, I cannot do it in offline mode, so seems the LOAD CSV or Cypher CREATE/MERGE query is the only choose, but for PoC project, I get the performance issue

In a 40 cores CPU, 30GB mem  and  SSD disk test server, the write performance is poor, it only get ~1000 nodes/s when using CREATE/MERGE cypher query and  ~5000 nodes/s when using LOAD CSV, and another big issue is only I try to update the nodes/relationships, neo4j server become unavailable.

Here is the sysinfo from Neo4j:


Here is the heap size and pagecache:

dbms.memory.heap.initial_size=12g
dbms.memory.heap.max_size=12g
dbms.memory.pagecache.size=14g

Here is the sample cypher:
CREATE CONSTRAINT ON (c:Company) ASSERT c.id IS UNIQUE;
CREATE CONSTRAINT ON (p:Person) ASSERT p.id IS UNIQUE;


USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///persop-{number}.csv' AS row
MERGE (person:Person { id: row[0] })
ON CREATE SET
......
ON MATCH SET
......

USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///company-{number}.csv' AS row
MERGE (company:Company { id: row[0] })
ON CREATE SET
......
ON MATCH SET
......

USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///person-legal-company-{number}.csv' AS row
MATCH (c:Person { id: row[0] })
MATCH (p:Company { id: row[1] })
MERGE (c)-[r:REL]->(p)
ON CREATE SET
......
ON MATCH SET
......


Waiting for your help, thanks in advance.

Rgards.


Auto Generated Inline Image 1
Message has been deleted

Michael Hunger

unread,
Nov 22, 2017, 6:50:14 AM11/22/17
to ne...@googlegroups.com
How much memory does your sever have? It should probably have 64G min, 16G for heap and 45G for page-cache.
How big is your database on disk (especially neostore.*.db)

What kind of disk do you have (you need a SSD).

Use Neo4j 3.3.0 Enterprise

You need also indexes/constraints for your nodes to match/merge on.
If your id's are numeric, use toInteger() and you can benefit from the faster numeric index.

Otherwise see:

Use Parameters!

Try to batch 10k-100k updates per transaction.




--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Liping Huang

unread,
Nov 22, 2017, 9:35:17 PM11/22/17
to Neo4j
Thanks Michael,

Total I have 32GB mem ( 8G Heap and 20GB for heapcache )

and I have 174GB store which get it form the linux command in the neo4j documentnation.

and I have the concrete index/unique constraints for the nodes.

then about the id, I made another business key(UUID), but leave the ID auto generated by neo4j.

在 2017年11月22日星期三 UTC+8下午7:50:14,Michael Hunger写道:
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Nov 23, 2017, 6:09:16 AM11/23/17
to ne...@googlegroups.com
Your actual store is around 30G so it would be useful to have a machine that fits that plus some extra (so get a 64G one).

As I said with neo4j 3.3 if your id's are numeric you should try to use toInteger on them.
If you import nodes separately you can also use a larger commit size, e.g 100k

Do you have an SSD or spinning disk?

Please also share logs/debug.log

Michael


To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages