Loading nodes matching an indexed UUID

26 views
Skip to first unread message

Vincent Mooser

unread,
Jan 29, 2018, 11:11:13 AM1/29/18
to Neo4j
Hi,
I am currently facing some performance problems when loading nodes using an indexed UUID. My use case is the following:

- I initiate a search query in Apache Solr which returns a list of 200 UUID (max)
- I load the 200 nodes corresponding to the uuid with the following cypher:

unwind {uuidList} as uuid
match(node : FOLDER { oid : uuid}) return node

The uuidList is a query param containing the list of UUID (string)

When the query has no page fault, it takes about 10-20ms to load the 200 nodes. But when some page faults appears in the query log, the query time can take up to 4 seconds. I understand that some nodes have to be loaded directly from the disk, but for 200 nodes, it looks very slow to me.

The FOLDER nodes are organized  like folders in a filesystem and are attached together with a 'PARENT' relationship. The only folder that does not have any parent is the root folder.

Environment specs are:
- 300M nodes 
- 600M relationships
- 110M nodes with the label 'FOLDER'
- all FOLDER nodes have a property 'oid' which index is online
- the graph.db directory is about 125g (without transaction logs)
- neo4j enterprise 3.2.6 and java driver 1.4.4
- 8g of Heap
- 32g of page cache
- no SSD

Any hints for improving performances ?

Thank you
Vincent

Michael Hunger

unread,
Jan 29, 2018, 9:04:50 PM1/29/18
to ne...@googlegroups.com
Hi,
this query should be better:

match(node : FOLDER) where node.oid IN {uuidList} return node

You have definitely a really bad system for this graph size:
How much memory does the machine have?

0. Switch to Neo4j Enterprise 3.3.2 which is more memory efficient
1. use an SSD
2. use more memory
3. use a constraint instead of an index

Otherwise you are effectively measuring disk speed.

The problem is that the nodes might be distributed across the disk and then it might have to load up to 200 pages with the HDD having to seek to each of the blocks.

Which properties of the nodes do you need to be returned? the full nodes?


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vincent Mooser

unread,
Jan 30, 2018, 10:27:36 AM1/30/18
to Neo4j
Hi,

How much memory does the machine have?

The machine has 64g of memory, so I think I can increase my page cache. But I should have at least twice this memory to be able to load the whole graph in the page cache. 
In my use case, as Solr only contains a subset of the FOLDER nodes (about 100000 nodes), I was thinking of executing a query that selects these 100000 nodes at start, for warming up the cache and to be sure that the page cache contains (at least) these nodes. Will they be evicted of the page cache after a certain amount of time ?

Which properties of the nodes do you need to be returned? the full nodes?

Yes, the full nodes have to be returned. They contain 1 oid (String), 1 property 'name' (String), 4 boolean properties used as flags for business tasks and 2 long properties (creation and modification date)

Thank you,
Vincent
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Jan 30, 2018, 1:34:30 PM1/30/18
to ne...@googlegroups.com
Hi Vincent,


On Tue, Jan 30, 2018 at 4:27 PM, Vincent Mooser <vincent...@gmail.com> wrote:
Hi,

How much memory does the machine have?

The machine has 64g of memory, so I think I can increase my page cache. But I should have at least twice this memory to be able to load the whole graph in the page cache. 

I would definitely increase the page-cache,

If it's only 100k nodes that you're  loading it should be fine.
The page-cache is emptied by utilization (LRU-K) so if those 100k nodes keep getting used, their pages stay in.
Although if a lot of other data is loaded they might get unloaded.
There is no idle eviction.

For the node-properties there are separate pages.
From your description it would be 2 or at most 3 property-records per node.

The disk is the biggest issue, if you can compensate with the larger page-cache to avoid disks hits that will help (at least for reads).

Switch to 3.3.2
Use 12G heap
Use 48G page-cache

Then this should be better.
Also try my query suggestion.

Cheers, Michael


To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

Vincent Mooser

unread,
Feb 1, 2018, 9:09:18 AM2/1/18
to Neo4j
Hi Michael, 
I applied all your recommendations and performance are better now. Next step will be the SSD.

Thank you for your help
Vincent

Michael Hunger

unread,
Feb 1, 2018, 7:47:19 PM2/1/18
to ne...@googlegroups.com
Glad to hear that. 

Perhaps we see each other during the graph tour in Europe :)

Cheers, Michael

To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages