[Cypher] The query that never comes

Leward

unread,

Apr 11, 2012, 5:39:18 AM4/11/12

to ne...@googlegroups.com

Hi,

If you haven't noticed it: the title of this thread is dedicated to Metallica (*singing in my head*). But I'm not here to talk about music :)

Hi try to make a Cypher query to compare the ways people are linked together.

Here is the query :

start
n = node:names(email='test@test..com')
match
p = n-[r:connect*1..5]->m,
n-[r2?:connect]->m
where
m <> n return distinct m,
min(length(p))
order by min(length(p))
skip 0 limit 5

So I open the console in the webadmin, copy and paste the query, then... I wait while my computer is getting hotter and louder. After 10 minutes I end up killing the process.

I think the issue comes from the 4th line with the question mark "?". Indeed, if I try to execute the following query it works. (Except that the results does not fit my need) :

start
n = node:names(email='te...@test.com')
match
p = n-[r:connect*1..5]->m,
n-[r2:connect]->m
where
m <> n return distinct m,
min(length(p))
order by min(length(p))
skip 0 limit 5

So I wonder if it is something which can be fixed on the Neo4j side, or my query is simply bad designed ?

Thanks in Advance,

Leward.

Michael Hunger

unread,

Apr 11, 2012, 5:56:03 AM4/11/12

to ne...@googlegroups.com

#1 What version of neo4j are you running?

#2 What does your dataset look like?

#3 did you try to just return count(*) to see how many nodes are iterated through (for aggregation and ordering)

#4 could we get it (off-line) to check the performance issue?

Thanks a lot

Michael

Andres Taylor

unread,

Apr 11, 2012, 6:01:19 AM4/11/12

to ne...@googlegroups.com

On Wed, Apr 11, 2012 at 11:39 AM, Leward <pj8...@gmail.com> wrote:

Hi,

If you haven't noticed it: the title of this thread is dedicated to Metallica (*singing in my head*). But I'm not here to talk about music :)

Hi try to make a Cypher query to compare the ways people are linked together.

Here is the query :

start
n = node:names(email='test@test..com')
match
p = n-[r:connect*1..5]->m,
n-[r2?:connect]->m
where
m <> n return distinct m,
min(length(p))
order by min(length(p))
skip 0 limit 5

So I open the console in the webadmin, copy and paste the query, then... I wait while my computer is getting hotter and louder. After 10 minutes I end up killing the process.

If we break down this query a bit. You want Neo4j to find all the paths between one node and all other nodes in your graph, that are connected by up to 5 steps. And, the same node might appear multiple times - you are asking for all the paths to all the nodes 5 steps away. How connected are your nodes? Take that number, and raise it by 5. I your nodes each have ~10 connections, we're quickly building up a lot of stuff to look at.

Next, you use the aggregate function min(), which forces your query to be eager, and not lazy. So all the stuff has to be kept in the heap until the query is finished.

Cypher is doing what you asked it to do, and you asked for something that takes a whole lot of work. No surprise there, right?

I think the issue comes from the 4th line with the question mark "?". Indeed, if I try to execute the following query it works. (Except that the results does not fit my need) :

start
n = node:names(email='te...@test.com')
match
p = n-[r:connect*1..5]->m,
n-[r2:connect]->m
where
m <> n return distinct m,
min(length(p))
order by min(length(p))
skip 0 limit 5

What's happening here is that, since you removed the question mark, Cypher finds all nodes that are between 1-5 relationships away, and throws out any that isn't directly connected to n. Much less stuff to keep in heap, and so much less to aggregate on.

So I wonder if it is something which can be fixed on the Neo4j side, or my query is simply bad designed ?

We're working on Cypher performance, but this a heavy query, and will probably never be very fast. Do you really need to look five steps out? There's a reason facebook and linkedin and the like don't look so many steps out...

Andrés

Leward

unread,

Apr 14, 2012, 10:06:33 AM4/14/12

to ne...@googlegroups.com

Hello,

I worked on this issue today and I finally found a pretty nice workaround. At the moment my graph DB is not very big: a few hundreds of nodes and relationships. So doing a simple query with p = n-[r:connect*1..5]->m is fast.

The fact is that my query was not very well designed. Now there is the rewritten query which work pretty well :

start

n = node:names(email='te...@test.com')
match

p = n-[r:connect]->()-[r2?:connect*0..4]->m

where
m <> n return distinct m,
min(length(p))
order by min(length(p))
skip 0 limit 5

Thanks for your answer, they were btw very instructive.

Regards, Leward.

Le mercredi 11 avril 2012 12:01:19 UTC+2, Andres Taylor a écrit :

Peter Neubauer

unread,

Apr 14, 2012, 10:37:09 AM4/14/12

to ne...@googlegroups.com

That's cool. Could you do a short blog ans set this up with a console.neo4.org link? Would be nice to see it visually :-)

Reply all

Reply to author

Forward