Is Neo4j Suitable for Large Scale-free Network?

46 views
Skip to first unread message

todd....@gmail.com

unread,
Jan 30, 2017, 4:33:31 AM1/30/17
to Neo4j
Hi,

I know Neo4j works well on large graphs, under assumption of nodes are generally equally distributed. However, in most cases, graphs in the real world follow a scale-free degree distribution. My question is, if relationship types and node labels are the same respectively, are there any ways to remain speedy when querying through nodes with really high degree, like 100k neighbor nodes?

P.S. I also posted this on StackOverflow, except no one has answered yet: http://stackoverflow.com/q/41832092/758413

---
BR,
Todd Leo

Chris Vest

unread,
Jan 30, 2017, 6:00:49 AM1/30/17
to ne...@googlegroups.com
Nodes with such a high degree are called super-nodes, and traversals that pass through them will likely experience degraded performance. Improving the performance in these cases is an active area of graph database research. Neo4j mitigates it a little bit by breaking the relationships of high-degree nodes into groups by relationship type. Traversal through high-cardinality groups is still going to be relatively slow, though, if you are only interested in a few specific relationships, and not all of a given type. Cypher is aware of this, and will use statistics to try and plan around it. I don't know how well it works in practice; it probably depends on the query and the structure of the data, as it usually does.

--
Chris Vest
System Engineer, Neo Technology
> --
> You received this message because you are subscribed to the Google Groups "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Todd Leo

unread,
Jan 30, 2017, 6:15:57 AM1/30/17
to ne...@googlegroups.com
Hi Chris,

Unfortunately in my case I cannot divide relationship/node into different types/labels. I'm particularly interested in how Cypher use statistics to try to work this out and affects the plan, could you explain in detail?

BR,
Todd Leo
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/O1KFfVHlg88/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Jan 30, 2017, 7:27:45 PM1/30/17
to ne...@googlegroups.com, Andres Taylor
Todd,

what do you expect as an answer?

We're currently working in some areas that could benefit from this, but usually, nodes with such a high degree add zero information to your results, so you can just skip them and assume an "always hit".

In general for pattern matching Cypher uses statistics to decide from which side of a (label)-[type]->(label) pair to expand and then runtime information for actually chosing the best side if both nodes are bound.

Michael



On Mon, Jan 30, 2017 at 12:15 PM, Todd Leo <todd....@gmail.com> wrote:
Hi Chris,

Unfortunately in my case I cannot divide relationship/node into different types/labels. I'm particularly interested in how Cypher use statistics to try to work this out and affects the plan, could you explain in detail?

BR,
Todd Leo

2017年1月30日 +0800 PM7:00 'Chris Vest' via Neo4j <ne...@googlegroups.com>,写道:
Nodes with such a high degree are called super-nodes, and traversals that pass through them will likely experience degraded performance. Improving the performance in these cases is an active area of graph database research. Neo4j mitigates it a little bit by breaking the relationships of high-degree nodes into groups by relationship type. Traversal through high-cardinality groups is still going to be relatively slow, though, if you are only interested in a few specific relationships, and not all of a given type. Cypher is aware of this, and will use statistics to try and plan around it. I don't know how well it works in practice; it probably depends on the query and the structure of the data, as it usually does.

--
Chris Vest
System Engineer, Neo Technology


On 30 Jan 2017, at 10.33, todd....@gmail.com wrote:

Hi,

I know Neo4j works well on large graphs, under assumption of nodes are generally equally distributed. However, in most cases, graphs in the real world follow a scale-free degree distribution. My question is, if relationship types and node labels are the same respectively, are there any ways to remain speedy when querying through nodes with really high degree, like 100k neighbor nodes?

P.S. I also posted this on StackOverflow, except no one has answered yet: http://stackoverflow.com/q/41832092/758413

---
BR,
Todd Leo

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/O1KFfVHlg88/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

Todd Leo

unread,
Jan 31, 2017, 3:01:39 AM1/31/17
to ne...@googlegroups.com, Andres Taylor
Hi Michael,

Thanks for your reply.

I would like to know if I have super-node in my graph, is there any thing I could do, like modifying Cypher to ease the pain.

I'm not familiar with the common modeling scheme in graph database. What I did is to use a node to represent "event", in the sense that an event node can hold three or more counter parties, rather than two, if you use an edge to represent events. In this case, an Email node can easily links to several hundred million event nodes, if the email is under operation of bots. In other words, I cannot "merge" the relationships into ones.




To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages