Need help in debugging latency issue with large relationship dataset.
Please find the details below:
System configuration
8 core, 32 GB VM on cloud
Neo4J configuration
page cache - 20 GB
heap - 8 GB
ObjectModel
Nodes share a relationship "COMMUNICATING_TO" with a relationship property "timestamp".
Query
find all the communications between nodes for a given time period, remove duplicate communications between two given nodes.
`MATCH (n1)-[r:COMMUNICATING_TO]->(n2) WHERE r.timestamp >= <fromTimestamp> AND r.timestamp <= <toTimestamp> RETURN {id:id(n1)} as fromNode, COLLECT(DISTINCT {id:id(n2)}) as toNode`
Data
100K nodes, with 500 millions relationships between them.
Challenge
For a given day, there are 2 million relationships that exist and the query time is ~50 seconds.
Any suggestions that can help in optimizing the query, system parameters and the object model is appreciated.
Thanks.