Neo4j DB > 32G Java queries get very slow

Patrice Loos

unread,

Jun 30, 2017, 4:43:12 PM6/30/17

to Neo4j

I am testing a java query on different size dataset, 100 Million to 1 Billion edges.

The query does not return much data 10 to 20 vertices with corresponding edges but it need to scan the whole dataset.

I can see a big performances degradation when the database size is bigger than 32 Gigs.

I am running the test on a 32 core 244G RAM virtual server, the query is threaded to use all cpu.

I changed the java heap size to 96G and played with the garbage collector options (retain -XX:+UseG1GC as the most improving option)

to get a better outcome but I still get big dip in performances, I assumed the threshold is around 32G:

100M edges, database is 7.5G : 12 min

250M edges, database is 19G : 35 min

500M edges, database is 38G : 12 hours with -XX:+UseG1GC

1B edges, database is 76G : 51 hours without -XX:+UseG1GC

Furthermore for the 0.5 Billion and 1 Billion test I can see that the bulk of the operations are system operations 60% versus

user operation 40% (from top linux command). When I run the smaller test 100% of the operations are user operations.

Are the java GC improvement in the Enterprise edition of Neo4j significant enough to bring the performance of the large scale dataset query in the same range as the smaller one?

Is there something else I can do to improve the performance of larger dataset queries?

tks

Patrice

Michael Hunger

unread,

Jun 30, 2017, 8:41:03 PM6/30/17

to ne...@googlegroups.com

What does your query look like?

How do you do this: "the query is threaded to use all cpu." ?

If it has to scan the whole dataset, depending on your memory config it has to first load the data into memory, where you measure the performance of your IO.

If the database is larger than memory it has to discard data and reload it again which affects this again massively.

Did you configure the page-cache in your neo4j.conf according to database size? And set the heap to e.g. 16 or 32G ? Larger heaps shouldn't make a difference.

Page-Cache is what counts most.

Which Neo4j version are you using? I recommend 3.2.1 Enterprise which comes for instance with compiled cypher runtime.

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

unreal...@googlemail.com

unread,

Jul 1, 2017, 6:21:45 AM7/1/17

to Neo4j

Before queries, I try and warm things up before performing any timings:

neo4j> call apoc.warmup.run() ;

and

neo4j> match (n:Entity) with n.name as name return count(*);

Michael, how much faster in real terms then is neo4j 3.2.1 over 3.2.0 Enterprise and for which operations?

To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

unreal...@googlemail.com

unread,

Jul 1, 2017, 6:24:04 AM7/1/17

to Neo4j

Patrice,

I started off with a VM configuration similar to yourself. I found a considerable speed up by ditching the VM and running native on the Linux platform.

Wayne

Michael Hunger

unread,

Jul 1, 2017, 6:39:59 AM7/1/17

to ne...@googlegroups.com

It all depends on your queries.

Neo4j Enterprise has the compiled Cypher runtime, which depending on the query can be 3-5 times faster.

It also has a new label index implementation which also speeds things up.

Without query examples / profile / data model etc. I can't give any predictions.

Michael

To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

Michael Hunger

unread,

Jul 1, 2017, 6:41:34 AM7/1/17

to ne...@googlegroups.com

Great point Wayne,

Also apoc.warmup.run(true) also load properties (but I personally don't use that property-loading a lot, more structural warm-up).