tree structure, millions of nodes

77 views
Skip to first unread message

Sarah C

unread,
Oct 1, 2017, 7:29:01 PM10/1/17
to Neo4j
Hi,

I have data with a natural tree structure with up to 100 million nodes, (a large file system)

Queries need to access data from all descendants of a node (depth around 20 max but lots of siblings)

Using relational or key/value databases, to get reasonable performance I have to pre calculate data for each node. (eg find the cumulative size of all child nodes). This takes time.

Would this type of data be any faster with neo4j?

I'm experienced with relational databases but have no experience of graph databases, and want to know whether it's likely to help. I'd hope that it could be easier to access all descendants of a node without tree traversal involving lots of 'get children' calls. But no idea really.

Can anyone help? Anyone done something similar?

Many thanks,

Sarah

Michael Hunger

unread,
Oct 1, 2017, 7:42:17 PM10/1/17
to ne...@googlegroups.com
Yes definitely, I recently generated a tree of 10bn nodes/relationships for a prospect and then we ran deep aggregation queries in a few ms to seconds.

The basic query is:

MATCH path = (start:File {id: $startId })<-[:PARENT*20]-(child)
RETURN nodes(path) 

Data import is also straightforward if you have two lists, one for the files and one for the parent relationship, then you can use the LOAD CSV command.

CREATE CONSTRAINT ON (f:File) ASSERT f.id IS UNIQUE;

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///files.csv" AS row
CREATE (file:File {id:row.id}) // or MERGE
SET file.name = row.name, file.size = toInteger(row.size), file.created = toInteger(row.created);

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///parents.csv" AS row
MATCH (parent:File {id:row.parent})
MATCH (child:File {id:row.child})
CREATE (parent)<-[:PARENT]-(child);

Feel free to join neo4j.com/slack and ask there for more concrete bits of help.

Michael


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sarah C

unread,
Oct 3, 2017, 9:49:34 AM10/3/17
to Neo4j
Thank you very much Michael, I've done what you suggest and it looks great .... but so far only with a test dataset, I still think the 100 million might stretch it but definitely worth carrying on. The problem is the aggregate queries as I want the top node of a subtree to access data from all nodes beneath it. I'll carry on experimenting and ask more concrete help on Slack.

Sarah
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Oct 3, 2017, 12:15:06 PM10/3/17
to ne...@googlegroups.com
For 100m you can also use neo4j import which should import that volume in a few minutes 

Von meinem iPhone gesendet
Reply all
Reply to author
Forward
0 new messages