Hi dear Neo4j community,My thesis work requires filling a Neo4j server instance with at least 1M nodes(+ ~5 relationships per node) as quickly as possible. (I am using Neo4j server instead of embedded one as I need to communicate between servers running on different machines over HTTP)
First, I tried REST Api Batch Ops(via Neography) but I realized that it is not the way to go. Then I found out Max's blog post and now I am trying to use batch-importer. It works, but it takes too much time.
As a comparison; it is written in the post that “Importing 7500000 Nodes took 17 seconds”, the same value for me is 8 times larger, 138 seconds. My testbed is an AWS Large instance with 7.5GB ram, 2virtual cores, Ubuntu 12.04, Oracle JDK 1.7(the instance is used just for testing batch-importer, so no any other application is running).
Batch importer ran about 5 hours before shutting it down. It was still putting dots and the last thing it printed out was “Importing 7500000 Nodes took 138 seconds” and something like "9.834.000ms for 10.000.000". Also. before shutting down size of the "db" directory were growing and "neostore.propertystore.db" were around 900MB and "neostore.relationshipstore.db" were around 400MB.
I used default settings for batch-importer as I cloned it from jexp/batch-import. I haven't changed anything and followed the steps given in the blog post as I just wanted to be sure that I am able to got it working :)
Does anyone has an idea what causes low performance? Or any suggestions about what should I tune or double check?
As a final note, when I changed the batch-importer code for creating 1M nodes with 2 relations per node everything finished in the blink of an eye.
volkan
--
--
--