Slow ingestion rate

41 views
Skip to first unread message

virprudensnocon...@gmail.com

unread,
Apr 22, 2016, 5:33:29 AM4/22/16
to Neo4j
We're loading 20 million abstracts into Neo4J.  The rate has been about 2.5 million abstracts per week.  For comparison, we can load all 20 million abstracts into Solr Cloud in less than 24 hours.  The abstracts average about 400-500 words each.  For each abstract, we have 5 additional entity nodes with a relationship between the abstract and these entities.   We're looking for any advice on speeding up the load times for Neo4J.

In our attempts to get better performance when ingesting the abstracts, we have tried combinations of py2neo version 2 and 3, and Neo4J version 2 and 3 Enterprise.  Our platform is a 2 processor 12 core Linux server with 32GB of memory.  We use the default Neo4J configuration.  We prefer merge() to ensure only one node per unique article ID but have tried create().  We utilize batches of 1000 articles.  We minimize round trips to the server with transactions, first the entities and then the relationships.  No find() or find_one() calls are necessary.  The script itself runs quickly and then lingers during the commit suggesting the slowdown is coming from the Neo4J server.  During our trials, we discovered and reported that py2neo 3 hangs indefinitely 99% of the time for merge() with the Bolt transaction.  It also hangs for the HTTP transactions, but its rare.

Once we get a reasonable single-threaded ingestion rate, we can consider running the load in parallel but since Neo4J is single threaded when updating (correct?), we're not sure that will help much.

Eventually we will be loading from a variety of sources in parallel so we must avoid solutions that wipe the Neo4J database first.

Has anyone else experience such slow load times?  Is there some best practices we've overloaded (other than writing directly in Java) that might help increase load performance?

Michael Hunger

unread,
Apr 22, 2016, 3:33:37 PM4/22/16
to ne...@googlegroups.com
Can you share your queries and a query plan?

Also do you have constraints for the merge properties?

Are any of the nodes you insert or connect to heavily contended?

Michael 

Von meinem iPhone gesendet
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages