Re: [Neo4j] Very very low performance for batch importer with default settings

73 views
Skip to first unread message

Mattias Persson

unread,
Aug 16, 2012, 3:13:36 AM8/16/12
to ne...@googlegroups.com
Which version of neo4j are you using? There has been an issue in recent milestones (at least) where the batch inserter isn't using proper memory mapping, where proper memory mapping of the store files gives a big boost in performance.

2012/8/15 volkan Tufekci <tufekc...@gmail.com>
Hi dear Neo4j community,

My thesis work requires filling a Neo4j server instance with at least 1M nodes(+ ~5 relationships per node) as quickly as possible. (I am using Neo4j server instead of embedded one as I need to communicate between servers running on different machines over HTTP)

First, I tried REST Api Batch Ops(via Neography) but I realized that it is not the way to go. Then I found out Max's blog post and now I am trying to use batch-importer. It works, but it takes too much time.

As a comparison; it is written in the post that “Importing 7500000 Nodes took 17 seconds”, the same value for me is 8 times larger, 138 seconds. My testbed is an AWS Large instance with 7.5GB ram, 2virtual cores, Ubuntu 12.04, Oracle JDK 1.7(the instance is used just for testing batch-importer, so no any other application is running).

Batch importer ran about 5 hours before shutting it down. It was still putting dots and the last thing it printed out was “Importing 7500000 Nodes took 138 seconds” and something like "9.834.000ms for 10.000.000". Also. before shutting down size of the "db" directory were growing and "neostore.propertystore.db" were around 900MB and "neostore.relationshipstore.db" were around 400MB.

I used default settings for batch-importer as I cloned it from jexp/batch-import. I haven't changed anything and followed the steps given in the blog post as I just wanted to be sure that I am able to got it working :)

Does anyone has an idea what causes low performance? Or any suggestions about what should I tune or double check?

As a final note, when I changed the batch-importer code for creating 1M nodes with 2 relations per node everything finished in the blink of an eye.

volkan


--
 
 



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

volkan Tufekci

unread,
Aug 16, 2012, 8:43:40 AM8/16/12
to ne...@googlegroups.com
Hi,

I'm using a fresh clone of jexp/batch-importer repo and it's written in the pom.xml that version of both neo4j-kernel and neo4j-lucene-index is 1.8.M07 

16 Ağustos 2012 Perşembe 10:13:36 UTC+3 tarihinde Mattias Persson yazdı:

Michael Hunger

unread,
Aug 16, 2012, 10:44:26 AM8/16/12
to ne...@googlegroups.com
I updated the github repo to use the snapshot which now again uses mmio

But not yet the jar

Sent from mobile device
--
 
 

volkan Tufekci

unread,
Aug 21, 2012, 9:40:04 AM8/21/12
to ne...@googlegroups.com
It works far better now.
For an EC2 Large instance it took around 5 minutes for creating 7.5M nodes and 23M relations
Thanks for the update.

16 Ağustos 2012 Perşembe 17:44:26 UTC+3 tarihinde Michael Hunger yazdı:

Michael Hunger

unread,
Aug 21, 2012, 9:48:33 AM8/21/12
to ne...@googlegroups.com
That's most probably the bad disk io of AWS instances, either try your local machine or a SSD or high-io EBS instance.

Michael

--
 
 

Reply all
Reply to author
Forward
0 new messages