Need help with Cypher query.. Performance..

75 views
Skip to first unread message

Zeeshan Arif

unread,
Mar 6, 2014, 11:09:09 AM3/6/14
to ne...@googlegroups.com
Hi,

I am trying to form a query where I want to know friends of a person A and their inter-relationships but no one should be present who is not directly a friend of A. For e.g.,

MATCH (A:person {id:xyz})-[r1:KNOWS]->(B:person) RETURN A as node_start, B as node_end 
UNION MATCH (C:person {id:xyz})-[r2:KNOWS]->(D:person)-[r3:KNOWS]->(E:person) 
WHERE (C)-->(E) RETURN  D as node_start, E as node_end

Again, trying to find out friends of a person and if any of the friends also KNOW each other but dont' want to introduce any friends of friends whom the person doesn't directly KNOW yet.

The query performs really bad above even with indexing on id (which is an integer) for the person. Very small data set works but in one case A has 3814 relationships and one of the B's has 44000 relatioinship, the query never returns..

Any hints suggestions - I have tried various variations of it using collect for the first part or anything but as soon as we enter the UNION or the 2nd phase of query it just never returns with the above number of relationships..

It would be good if I can perhpas pass in the collected id's as a list from the 1st part to the match clause in 2nd or to the lucene index ..

Thanks,

- Zeeshan

Zeeshan Arif

unread,
Mar 6, 2014, 11:10:48 AM3/6/14
to ne...@googlegroups.com
Forgot to mention the environment.. Neo4j 2.0.1 and it also has spatial index with 2.0.1 libraries besides lucene indices on other properties in the dataset..

Ziv Unger

unread,
Mar 7, 2014, 12:10:44 AM3/7/14
to ne...@googlegroups.com
Off the top of my head, you likely don't need all those label lookups or the union, and that last (c)-->(e) will look for any length path from c to e, which is probably not what you want and will be very slower with thousands of records. I created a very simplistic representation of this locally, and came up with this:

match (a:Person {id: 1})-[r1:knows]->(b)
with a, r1, b
optional match (a)-[r3:knows]->(c)<-[r2:knows]-(b)
return a, b, c, r1, r2, r3

As Michael explained to me, if you only have one type of relationship between labels (ie. Person is only connected to other Persons via [:knows]) then omitting the label actually speeds up the query.

What the above does is gets all the people that (a) knows directly as (b), then optionally matches any people (b) know directly which (a) also knows directly. This comes back with results like this:

+-----------------------------------------------------------------------------------------------------------------------+
| a                  | b                  | c                  | r1               | r2               | r3               |
+-----------------------------------------------------------------------------------------------------------------------+
| Node[260646]{id:1} | Node[260647]{id:2} | Node[260648]{id:3} | :knows[293029]{} | :knows[293030]{} | :knows[293031]{} |
| Node[260646]{id:1} | Node[260648]{id:3} | <null>             | :knows[293031]{} | <null>           | <null>           |
| Node[260646]{id:1} | Node[260649]{id:4} | <null>             | :knows[293032]{} | <null>           | <null>           |
+-----------------------------------------------------------------------------------------------------------------------+

So where there's a friend of a friend that a knows too (c) it gets returned, otherwise you just get the list of friends.

That's what I gathered you were trying to do from your description, and I think it should perform a lot better.

Ziv Unger

unread,
Mar 7, 2014, 12:15:42 AM3/7/14
to ne...@googlegroups.com
As an added note, if you DID want any length of path between (a) and (c), you can just use [r3:knows*1..x] where x is the upper limit of the path length. If you have a lot of nodes, I would personally limit the upper range for performance.

Zeeshan Arif

unread,
Mar 7, 2014, 11:44:44 AM3/7/14
to ne...@googlegroups.com
Thanks a lot for providing some inputs. However, for me, with the initial posting of records i.e. 3814 relations for primary and some of the secondary have 40K+ relationships, it is still not performing well - infact doesn't return.

Few questions:

1. Are you sure that with one kind of relationship between nodes, skipping the label helps? Intuitively, qualifying everything should always help.. isn't it?

2. C-->E will look for any length path? I thought it will look only for direct neighbors by default and only if I mention *1..x, it will go x deep.

3. Any idea if UNION is expected to work better than OPTIONAL MATCH clause and combining results together?

Regards,

- Zeeshan

Ziv Unger

unread,
Mar 7, 2014, 1:33:04 PM3/7/14
to ne...@googlegroups.com
1. According to Michael Hunger at neo4j, yes. I've also seen evidence of this when optimising my own queries on his advice.

2. You are correct here, sorry. Will teach me to reply before coffee in the morning.

3. Not sure, someone with better knowledge of the inner workings of the db would have to weigh in. You could use a profiling tool like https://github.com/mneedham/cypher-query-tuning, but if your queries don't return at all, it won't help. I did find that when requesting huge numbers of results, queries slow down dramatically. If you add a limit 100, does it still not return?

I had a situation where even with a completely tuned query, returning thousands of results still took ages. If the limit helps, perhaps paging is your answer.

Otherwise, hopefully one of the neo4j guys will weigh in soon.
--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/5hapONQs7Qo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Hunger

unread,
Mar 7, 2014, 1:50:29 PM3/7/14
to ne...@googlegroups.com
Any chance to share your db with us privately if you want to?

Will try to find some time to look into it.

Cheers,

Michael

----
(michael}-[:SUPPORTS]->(YOU)-[:USE]->(Neo4j)
Learn OnlineOffline or Read a Book (in Deutsch)
We're trading T-shirts for cool GraphGist Models





You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages