Latency issue with large relationship dataset

10 views

Skip to first unread message

gram...@wootcloud.com

unread,

Feb 7, 2018, 6:49:52 AM2/7/18

to Neo4j

Need help in debugging latency issue with large relationship dataset.

Please find the details below:

System configuration

8 core, 32 GB VM on cloud

Neo4J configuration

page cache - 20 GB

heap - 8 GB

ObjectModel

Nodes share a relationship "COMMUNICATING_TO" with a relationship property "timestamp".

Query

find all the communications between nodes for a given time period, remove duplicate communications between two given nodes.

`MATCH (n1)-[r:COMMUNICATING_TO]->(n2) WHERE r.timestamp >= <fromTimestamp> AND r.timestamp <= <toTimestamp> RETURN {id:id(n1)} as fromNode, COLLECT(DISTINCT {id:id(n2)}) as toNode`

Data

100K nodes, with 500 millions relationships between them.

Challenge

For a given day, there are 2 million relationships that exist and the query time is ~50 seconds.

Any suggestions that can help in optimizing the query, system parameters and the object model is appreciated.