usefulness of using WITH instead of one big match/where

32 views
Skip to first unread message

Javad Karabi

unread,
Jan 21, 2014, 11:25:41 AM1/21/14
to ne...@googlegroups.com

(c:Customer)-[:ordered]->(p:Product)-[:category]->(:Category)

Now, say that there are 2:
c-[:ordered]->(:Product { name: "pants", quantity: 10})
c-[:ordered]->(:Product { name: "shirt",   quantity: 5})

Now, say that if I only want to cross the category relationship if the p.quantity > 6

In the most basic way, I would do:

(c:Customer)-[:ordered]->(p:Product)-[:category]->(cat:Category)
WHERE p.quantity > 6

However, I figured that maybe neo4j would (non-optimally) traverse the entire path _then_ filter where on top of the path.

So what I did was:

MATCH (c:Customer)-[:ordered]->(p:Product)
WHERE p.quantity > 6
WITH p
MATCH p-[:category]->(cat:Category)

This, I figured, would then allow neo4j to cross out to all the product nodes, as I would need them anyway in order to filter out the ones which have a quantity of less than 6.


Now... finally to my question.
The following URL:
http://docs.neo4j.org/chunked/stable/query-match.html
states that:
WHERE defines the MATCH patterns in more detail. The predicates are part of the pattern description, not a filter applied after the matching is done. 

So, my question is, if the predicates (specifically p.quantity > 6) are part of the pattern description, and _not_ applied _after_ matching (therefore applied before or during), then cutting the query with the WITHs would be a moot point

So, I would think that 

(c:Customer)-[:ordered]->(p:Product)-[:category]->(cat:Category)
WHERE p.quantity > 6

would be sufficient, , as neo4j _would not_ actually traverse to cat, since it would apply the filter during the match process.

However, in practice, I notice that using WITH is actually faster. Is there any possible reason for this?
It may be necessary for me to show my query exactly, I also have the profile data for the query, which I am currently analyzing

Javad Karabi

unread,
Jan 21, 2014, 11:29:53 AM1/21/14
to ne...@googlegroups.com
from what I can tell, if there where clause is ">" or "<" (as it is in the actual query which i am using, not in this example query...) then the WHERE predicate _is in fact_ a filter, applied _after_ the match. It looks to me that "TraversalMatcher()" does not apply predicates which involve > or <, but instead delegates this to "Filter()" after the fact, which does not correlate with what is stated on the documentation.

Mark Needham

unread,
Jan 21, 2014, 11:32:42 AM1/21/14
to ne...@googlegroups.com
Javad,

Can you paste the PROFILE output for the two queries on here. In theory the two queries should do the same thing...in practice I imagine that's not the case!

Mark


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michael Hunger

unread,
Jan 21, 2014, 11:49:37 AM1/21/14
to ne...@googlegroups.com
Java, what version are you using?

2.0 final?

Michael

Javad Karabi

unread,
Jan 21, 2014, 11:55:03 AM1/21/14
to ne...@googlegroups.com
Mark, I would be happy to. Give me a moment and I will post them.

Michael, 
  • Kernel version

    neo4j-browser, version: 2.0.0


Javad Karabi

unread,
Jan 21, 2014, 12:06:06 PM1/21/14
to ne...@googlegroups.com
Mark, I have emailed you the query and profile for both cases.

Javad Karabi

unread,
Jan 21, 2014, 12:08:32 PM1/21/14
to ne...@googlegroups.com
You will notice:
"WHERE (Property(NodeIdentifier(),cached_available(71)) == Literal(1)" in the TraversalMatcher() portion, the very first function of the profile..

I believe that this is what is meant when the documentation says that the WHERE clause is not done after, (therefore during) the matching process.

However, you will also notice that immediately following that function, is Filter(), which is then filtering based on the ">" and "<" predicates of the query.

obviously, the best case scenario would be if the ">" and "<" tests occurred inside TraversalMatcher(), i think

Michael Hunger

unread,
Jan 21, 2014, 12:11:57 PM1/21/14
to ne...@googlegroups.com
The problem is cross-path expressions, which are not yet handled in that manner

for simple expressions that only contain a single piece of the path (node, rel) and things that have been evaluated before (parameters, literals, previous computations) WILL be used to shortcut the path evaluation.

but if you do: n1--n2--n3

and then WHERE n2.foo > n1.bar it will be only applied AFTER the path

if you do: WHERE n1.foo > 10 it will be applied DURING the path traversal

HTH

Michael

Javad Karabi

unread,
Jan 21, 2014, 12:19:13 PM1/21/14
to ne...@googlegroups.com
Michael, I apologize, I will send you a copy of the query + profile too.
In my actual query, I am using a parameter of the cypher query:
WHERE other.birth_year > (me.birth_year - {age_difference_range})
      AND other.birth_year < (me.birth_year + {age_difference_range})

here is the relevant profile portion:
Filter
  pred="(((Property(other,birth_year(66)) > Subtract(Property(me,birth_year(66)),Literal(10)) AND Property(other,birth_year(66)) < Add(Property(me,birth_year(66)),Literal(10))) AND Property(sv,cached_available(71)) == Literal(1)) AND hasLabel(sv:StyleVariant(13)))", 
  _rows=47,
  _db_hits=4860

Michael Hunger

unread,
Jan 21, 2014, 12:21:32 PM1/21/14
to ne...@googlegroups.com
Right, cross path comparisons are not yet used to shortcut path-finding

so if you rewrite your query to this, it will actually filter down the paths eagerly

MATCH (me:Member {id: 11700})
WITH me, me.birth_year as birth_year
MATCH (me)-[ra:preferred_store]->(s)<-[rb:preferred_store]-(other)-[rc:ordered]->()<-[rd:product]-(sv:StyleVariant)
WHERE abs(other.birth_year - birth_year ) <  {age_difference_range} AND sv.cached_available = 1
....

Javad Karabi

unread,
Jan 21, 2014, 12:31:24 PM1/21/14
to ne...@googlegroups.com
Michael, awesome, thank you.

just to make sure I understand correctly, in this case, when you say 'cross path comparison',
what are the 2 paths you are referring to?

Javad Karabi

unread,
Jan 21, 2014, 12:37:36 PM1/21/14
to ne...@googlegroups.com
ah... i think i know what you mean.
that is, that i am comparing me.birth_year, and other.birth_year, both of which were part of the same path, so splitting it up like you did (via the WITH me.birth_year) did the trick?

Michael Hunger

unread,
Jan 21, 2014, 1:48:40 PM1/21/14
to ne...@googlegroups.com
Yes, that's true.
 
cross path meant from different segments of the path.

Michael

Javad Karabi

unread,
Jan 21, 2014, 2:48:22 PM1/21/14
to ne...@googlegroups.com
this is weird... in your example, you got the member node and piped it into the next portion, via:
MATCH (me:Member {id: {member_id}})
WITH me, me.birth_year as birth_year

im assuming this is so that the comparison on me.birth_year and other.birth_year can occur, without having the cross-path comparison issue.

however, when i do that, it looks like the execution plan uses PatternMatch as opposed to using TraversalMatcheri , which is preferable, as t seems to me that TraversalMatcher has the ability to include WHERE predicates as it traverses. 

when i include (me:Member {id: {member_id}}) as part of the same 'match' clause, however, it looks like TraversalMatcher is selected by the execution plan builder, which greatly increases the performance of my query...


--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/sPUjrAoJwyY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Jan 21, 2014, 2:56:30 PM1/21/14
to ne...@googlegroups.com
Interesting, I assumed with 2.0 final it would still use the traversal-matcher -> there were some fixes around that in 2.0

Let's check that out and report a GH issue in case it persists.

Michael

Javad Karabi

unread,
Jan 21, 2014, 3:06:21 PM1/21/14
to ne...@googlegroups.com
michael, i will continue to investigate too.
would you like any information from my side?

Javad Karabi

unread,
Jan 21, 2014, 3:08:06 PM1/21/14
to ne...@googlegroups.com
for what it is worth, it _is_ actually using traversalmatcher when i write the query in the most elegant and cleanest way (that is, i dont even need to use WITH, i can just use one MATCH clause to find my anchor then put all my conditions in one WHERE clause).

the problem is, when (for whatever reason) i want to pipe into the match, via WITH, that is when PatternMatch is used, and this problem seems to exist.

so im assuming it has to do with the WITH breaking the ability to use TraversalMatcher
Reply all
Reply to author
Forward
0 new messages