neo4j cypher query too slow

89 views
Skip to first unread message

Guitao Ding

unread,
Jul 9, 2014, 10:23:19 AM7/9/14
to ne...@googlegroups.com
Hi all,

I'm leaning to user neo4j for relation analysis recently. Today I found it took too long for my cypher query took to finish.

I used the batch importer to import all data into neo4j. And I wanted to find the smallest user_id connected (directly or undirectly, path length no limit) to each user_id. Below is the details:

neo4j version: 2.1.2
nodes num: 16M (three labels)
relation num: 10M
cypher query: 
match (n:user_id)-[:mapping*]-(d:user_id)
with n.value as user_id,
case when min(d.value) > n.value then n.value else min(d.value) end as people_id
return user_id, people_id


What should I do to improve my query performance? Any suggestions would be appreciated!

Thanks in advance.

 Guitao

Pavel

unread,
Jul 9, 2014, 5:37:04 PM7/9/14
to ne...@googlegroups.com
Hello Guitao,

I think I'm having the same issue here:

Michael Hunger

unread,
Jul 9, 2014, 6:01:14 PM7/9/14
to ne...@googlegroups.com
This is a graph global query, with unlimited paths, so it might generate many billions or trillions of paths to look at.
Esp. if you don't provide a direction.

if your nodes are all users, then you do the equivalent of finding all paths between the cross product of 16M^2

Perhaps you can describe your actual use-case that you try to solve?

Michael



--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Guitao Ding

unread,
Jul 9, 2014, 9:05:44 PM7/9/14
to ne...@googlegroups.com

Hi Michael,

The data is from one website. The user_id is the ID of each registered user. And every user_id is linked to one or more cookie_ids (the value of one cookie).
Also one cookie_id is linked to one or more user_ids. So user_id and cookie_id are many to many mapping. My case is to find all user_ids linked with each other (via cookie_ids, and no limit on the path length) and assign them one unique ID (here i used then smallest user_id in the path).

for example:
All user_ids (user_id1, user_id2, user_id3) in the following path should be assigned one same ID (e.g. user_id1)
user_id1----cookie_id1----user_id2-----cookie-id2-----user_id3

I imported the data to neo4j and use different labels for user_id and cookie_id. There is only one type of relationship (cookie_id -- user_id) and the direction doesn't matter.

Guitao Ding

unread,
Jul 10, 2014, 8:58:22 AM7/10/14
to ne...@googlegroups.com
Changed to Spark Graphx and it solved my problem perfectly. :)

Michael Hunger

unread,
Jul 10, 2014, 9:05:58 AM7/10/14
to ne...@googlegroups.com


match (n:user_id)
match (n)-[:mapping*]-(d:user_id)
return n.value as user_id,
case when min(d.value) > n.value then n.value else min(d.value) end as people_id




could you also try to run:

match (n:user_id)
match (n)-[:mapping*]-(d:user_id)
RETURN count(*)

Would you be able to share your Neo4j database with me?

Thanks,

Michael

丁桂涛

unread,
Jul 10, 2014, 9:24:16 AM7/10/14
to ne...@googlegroups.com
Hi Michael,



Thanks for yo kind reply.

I'm testing your code. As I replied earlier, I already swithed to Spark Graphx and its connected components algorithm perfectly solved my problem. Besides, Spark can read files on HDFS and run on clusters. It only took few seconds to accomplish my task.

So I just temporarily put neo4j aside and thank you again for your answer!

Best Regards,
 
Guitao 

Michael Hunger

unread,
Jul 10, 2014, 1:43:38 PM7/10/14
to ne...@googlegroups.com
If possible I'd still love to get a copy of your db to test out some ideas.

Michael

丁桂涛(桂花)

unread,
Jul 10, 2014, 9:58:25 PM7/10/14
to ne...@googlegroups.com
I really want to share the data with you, but the data is kind of confidential and I am sorry for that.

Guitao


--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/uDmP975PsN8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
丁桂涛
Reply all
Reply to author
Forward
0 new messages