[cypher] How to speed up this query

Nicolas Clairon

unread,

Jul 23, 2013, 5:30:02 AM7/23/13

to ne...@googlegroups.com

Hi,

I'm doing faceting on my data with cypher. It work but it can be very slow. Is it possible to speed up this query which takes over than 10s ? (GuartTimeoutException timeout occured (overtime=1))

Here's the query:

START node=node:node_auto_index(title="fre")

MATCH node-[rel0]-target,

target-[rel1]-node1,

target-[rel2]-node2,

target-[relfacet]-facet

WHERE target.type = "article" AND type(rel0) = "has_lang"

AND type(rel1) = "about" AND node1.`title` = "Phylogeny"

AND type(rel2) = "about" AND node2.`title` = "Models, Genetic"

RETURN type(relfacet) as rel, facet.title as value, count(facet.title) as occ

ORDER BY rel, occ DESC

I have auto indexes on title and type.

Here some infos on the data:

78247 nodes

343866 edges

3 relationships type

1 node "fre" with 2834 relations

1 node "Phylogeny" with 723 relations

1 node "Models, Genetic" with 298 relations

This is not big data at all. Is WHERE using index on Neo4j ? Why is this so slow ?

Thanks,

N.

Luanne Coutinho

unread,

Jul 23, 2013, 6:06:46 AM7/23/13

to ne...@googlegroups.com

Could you check if this does any better?

START node=node:node_auto_index(title="fre")
MATCH node-[rel0]-target

WHERE target.type = "article" AND type(rel0) = "has_lang"

WITH target
MATCH target-[rel1]-node1
WHERE type(rel1) = "about" AND node1.`title` = "Phylogeny"
WITH target
MATCH target-[rel2]-node2
WHERE type(rel2) = "about" AND node2.`title` = "Models, Genetic"
WITH target
MATCH target-[relfacet]-facet

RETURN type(relfacet) as rel, facet.title as value, count(facet.title) as occ
ORDER BY rel, occ DESC

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nicolas Clairon

unread,

Jul 23, 2013, 7:08:48 AM7/23/13

to ne...@googlegroups.com

Yes ! This query is actually very much faster (~ 250ms).

Actually, I went into a fastest version:

START node0=node:node_auto_index(title="fre"), concept1=node:node_auto_index(title="Phylogeny"), concept2 = node:node_auto_index(title="Models, Genetic")

MATCH node0-[rel0]-target,

target-[rel1]-concept1,

target-[rel2]-concept2,

target-[relfacet]-facet

WHERE target.type = "article" AND type(rel0) = "has_lang"

AND type(rel1) = "about"

AND type(rel2) = "about"

RETURN type(relfacet) as rel, facet.title as value, count(facet.title) as occ

ORDER BY rel, occ DESC

It takes ~36 ms. The limitation with this query is that you can't use regexp with it.

Why is there such a difference beetween those queries ? The semantic is the same so why cypher doesn't optimize the query by itself ?

I suspect that cypher doesn't use index in the WHERE clause. Why "WITH" brings so much performance ?

Luanne Coutinho

unread,

Jul 23, 2013, 7:19:51 AM7/23/13

to ne...@googlegroups.com

What you did was just limit the number of nodes matched. In your earlier query it probably matched all node1's and node2's only to filter them out on title. Now you have a limited set (concept1/2) so the matching has considerably reduced.

Do you need to have the type(rel) in your where? Or can you just do a target-[:about]-concept1 ?

Nicolas Clairon

unread,

Jul 23, 2013, 8:25:08 AM7/23/13

to ne...@googlegroups.com

I thought that Cypher was smart enought to detect indexed field in the WHERE clause and process them like START node so:

START node=node:node(*)

MATCH node--target

WHERE node.title = "Foo"

RETURN target.title

would be the same than:

START node=node:node_auto_index(title="Foo")

MATCH node--target

return target.title

The syntax is different but the meaning is the same. So I though that the Cypher's internal parser would do the math...

Last question : why the "WITH" statement is bringing so much speed ? How does Cypher work with this statement ?

Luanne Coutinho

unread,

Jul 23, 2013, 9:10:24 AM7/23/13

to ne...@googlegroups.com

I don't know enough about Cypher's parser to answer your first question-- I guess some of the Neo4j folks here can answer that.

About the WITH, all I did was try and reduce the number of nodes at every step. So, instead of a potentially large match with

MATCH node-[rel0]-target,

target-[rel1]-node1,

target-[rel2]-node2,

target-[relfacet]-facet

resulting in a lot of paths matched only to filter out based on node1.title, node2.title and target.type, I attempted to first come up with a smaller set of "target" nodes by pushing the filter up earlier:

START node=node:node_auto_index(

title="fre")
MATCH node-[rel0]-target

WHERE target.type = "article" AND type(rel0) = "has_lang"

The WITH clause simply pipes those results to the next part of your query but now it has to deal with a smaller set of starting nodes for
MATCH target-[rel1]-node1 and so on.

Michael Hunger

unread,

Jul 23, 2013, 9:41:19 AM7/23/13

to ne...@googlegroups.com

Thanks Luanne for chiming in and giving this great advice.

Nicolas, we want cypher become smarter with every version on the optimization, and it will.

But currently the focus is still mostly on the language and not so much on optimizations. But that will change a lot over time.

In Neo4j 2.0 it already works like that if you do:

MATCH node--target
WHERE node.title = "Foo"
RETURN target.title

and an index is there it will be transformed internally into an index lookup.

Michael

Nicolas Clairon

unread,

Jul 24, 2013, 4:06:02 AM7/24/13

to ne...@googlegroups.com

Thanks Luanne, thanks Michael. I sometime forget how youg is Cypher.

I'm glad to hear that cypher will be smarter in the future. I wonder if there is a doc which explain how to make the best optimized query with Cypher (kind of best practices). Or maybe it is possible to make this query faster via ReST ?

Michael Hunger

unread,

Jul 24, 2013, 4:42:20 PM7/24/13

to ne...@googlegroups.com

Luanne has published a great blog post about optimizing cypher queries.

http://thought-bytes.blogspot.de/2013/01/optimizing-neo4j-cypher-queries.html

In general "profile" helps you to show the # of db hits which you want to minimize.

Generally speaking:

MATCH expands the subgraph you look at in a "cross-product" like fashion,

e.g. if you have a -[:FRIEND]-> b -[:FRIEND]-> c-[:FRIEND]-> d

it will for all FRIEND relationships of a find the b-nodes which might be 100

and then for every of these b nodes go over all those FRIEND relationships to find c - nodes which might be 100 per b, meaning it will find 100*100 = 10k c nodes in total (including duplicates) if you have a match that goes further on towards d, it will do it for all of the 10k c nodes even if they might just be 200 distinct nodes.

so breaking up that statement into two matches and running an distinct on the c nodes it is the difference of doing 10k matches from all c to d (e.g. 10k*100 = 1M)

vs.

200 matches from the distinct c to d which are then just 20k

MATCH a -[:FRIEND]-> b -[:FRIEND]-> c

WITH distinct c

MATCH c-[:FRIEND]-> d

Another thing that is important to consider:

Cypher pulls expression from the where clause that are related to nodes or relationships of a match into the pattern matcher e.g.

WHERE c.age > 18

running this expression 10k times vs 200 times can make a big difference too, especially if the expression is more expensive (like a regexp).

Michael

Nicolas Clairon

unread,

Jul 27, 2013, 7:22:36 AM7/27/13

to ne...@googlegroups.com

Thank you very much Michael. This link is quite a nugget and, combine with your explanation, I understand better how cypher is working.

I pretty like Neo4j and want to replace my current database but my queries with cypher were not fast enough. I'm now taking the time to go deeper with cypher and learn best practices on graph database schema.

Thanks for the hard work !

Michael Hunger

unread,

Jul 27, 2013, 8:00:00 AM7/27/13

to ne...@googlegroups.com

Cypher is not fully optimized yet.

But good enough for 90% of the use-cases.

There might be some queries that are much faster as an unmanaged server extension.

Sent from mobile device

Reply all

Reply to author

Forward