Cypher Feature Suggestion: bind new identifiers in predicates

54 views
Skip to first unread message

Kai Chen

unread,
Oct 28, 2014, 9:12:47 PM10/28/14
to ne...@googlegroups.com
Hi,

I'm not sure if this is the right place to submit this.  I was going to open an FR ticket on github but changed my mind because I thought maybe it's better to have a discussion here first.

I've run into a couple of places where being able to say something like
    match (n ....) /* criteria doesn't involve binding an identifier n2 */
    where n.prop > threshold or ( (n)-->(n2:label{qualifier:"value"}) and n2.prop < threshold )
would make the query a lot easier to read.  I'm aware that, in the case of 'OR', I could use a union after using 2 separate match clauses.  And that's what I've been going along with, until now when I need to dynamically translate a user query into Cypher.  Here using union can become very complex, as the relationships can nest arbitrary levels deep.  But if we had a syntax that can bind new identifiers in predicates, it would be very easy and, more importantly, very readable.

I've prepared a few simple use cases below.  (See attached image of the model)

* Data Set

  Below creates a set of 7 nodes consisting of 4 circles, 2 squares, and 1 triangle.
  2 circles point to 2 squares, 1 circle point to the triangle, and another circle is dangling.

create (:circle{id:1})-[:uses]->(:square{id:1});
create (:circle{id:2})-[:uses]->(:square{id:2});
create (:circle{id:3})-[:uses]->(:triangle{id:3});
create (:circle{id:4});

* Verification

neo4j-sh (?)$ match (c:circle) optional match (c)-[r]->(n) return c, labels(c), r, n, labels(n);
+--------------------------------------------------------------------------+
| c              | labels(c)  | r          | n              | labels(n)    |
+--------------------------------------------------------------------------+
| Node[10]{id:1} | ["circle"] | :uses[7]{} | Node[11]{id:1} | ["square"]   |
| Node[12]{id:2} | ["circle"] | :uses[8]{} | Node[13]{id:2} | ["square"]   |
| Node[14]{id:3} | ["circle"] | :uses[9]{} | Node[15]{id:3} | ["triangle"] |
| Node[16]{id:4} | ["circle"] | <null>     | <null>         | <null>       |
+--------------------------------------------------------------------------+

* Queries

1) Circles that don't point to any Squares
    This is easy and can be supported with the current syntax.

neo4j-sh (?)$ match (c:circle) where not (c)-->(:square) return c;


+----------------+
| c              |
+----------------+
    | Node[14]{id:3} |
    | Node[16]{id:4} |
+----------------+

2) Circles that don't point Square(1)
    This can also be accomplished using the current syntax.  So path is already supported; the only thing missing is to bind an identifier which would allow filtering with additional predicate expressions.

neo4j-sh (?)$ match (c:circle) where not (c)-->(:square{id:1}) return c;
+----------------+
| c              |
+----------------+
| Node[12]{id:2} |
| Node[14]{id:3} |
| Node[16]{id:4} |
+----------------+

3) Circles that point to Square(1) or with id(4)
    Here it's starting to get hairy.  Union queries may also become grossly inefficient if the result sets are large.  This is where identifier-binding in predicates can help make query more efficient and maybe easier to read also.

neo4j-sh (?)$ match (c:circle{id:4}) return c
> union match (c:circle)-->(s:square{id:1}) return c;
+----------------+
| c              |
+----------------+
| Node[16]{id:4} |
| Node[10]{id:1} |
+----------------+

    Would like to say
        match (c:circle) where (c.id=4) or ( (c)-->(s:square{id:1}) ) return c

4) (Circle, Square) where Circle is either id(4) or points to Squares with id < 4
    This is where the union query is beginning to deteriorate in comprehensibility.  One has to remember to use optional match.  And I don't know what it would look like, if the optional match is 2 or 3 levels deep.  Now imagine this is a portion of a larger query, where the 'c' nodes are found by matching in another pattern.  Using the union would require one to duplicate that code in all the subsets.  Having more shared predicates would have the same effect.

neo4j-sh (?)$ match (c:circle)-->(s:square) where s.id < 4 return c,s
> union match (c:circle{id:4}) optional match (c)-->(s) return c,s;
+---------------------------------+
| c              | s              |
+---------------------------------+
| Node[10]{id:1} | Node[11]{id:1} |
| Node[12]{id:2} | Node[13]{id:2} |
| Node[16]{id:4} | <null>         |
+---------------------------------+


Hope that was clear.  And sorry for the long post.

I also would be more than happy to help implement this if it's not too difficult and someone can point me to the right place to start -- it'd be a feature that I'd really use a lot.

Cheers,
Kai
7-nodes.png
7-nodes.svg

Michael Hunger

unread,
Oct 28, 2014, 9:54:34 PM10/28/14
to ne...@googlegroups.com
You can also do number 3) because you actually don't need "s"

match (c:circle) where (c.id=4) or ( (c)-->(:square{id:1}) ) return c

Only #4 is trickier but still possible, only not so nice to read :)

I think union is still the better choice here as you combine 2 different use-cases:

match (c:circle)-->(s:square) where s.id < 4 return c,s
union match (c:circle{id:4}) optional match (c)-->(s) return c,s;

But you can use a path expression as a collection of paths, which you then can use in collection predicates (all, any, single, none) , filter, extract, reduce.

match (c:circle)
where any(p in (c)-->(:square) where last(nodes(p)).id < 4) return c



--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

S. Kai Chen

unread,
Oct 29, 2014, 1:14:00 AM10/29/14
to ne...@googlegroups.com
Hi, Michael,

Thanks for the quick response!  And sorry about my own late reply: I had to go to a meeting right after I sent the first post and have just gotten back right now.

Interesting suggestion with using path expression and collection predicates.  How would you include an optional path here?  Is there a way to return the 4th circle, which won't match in the path?

I agree union is a clearer way to write it, especially when there are only 2 subsets.  However, I'm concerned about situations where one needs to include more subsets, sometimes maybe 4 or 5 or more.  With compound criteria, it's easier to read when all the conditions are in the where clause -- to me at least.

What makes it harder for me is that I'm generating the Cypher queries dynamically, based on a tiny query language that the user puts together using a web UI (originally through a lot of NOT|AND|OR dropdowns but now with a rich editor embellished with auto-complete).  So essentially the query comes in the form of such-and-such-class-with-these-properties; it gets compiled and then gets translated into Cypher after it passes validation.

It's definitely a lot easier to translate that user query into a Cypher that actually allows new identifiers in the predicate; a lot more work to do the translation in terms of union -- maybe even more than what I might spend implementing the identifier in the predicates in Cypher.

Does that make sense or would some examples with the more complex type of queries help?

Cheers,
Kai


--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/N5k06664XYI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Wes Freeman

unread,
Oct 29, 2014, 1:28:34 AM10/29/14
to Neo4J
How about this? Defer predicates until you match more than you need, then double check what you need to confirm in WHERE.

MATCH (c:circle)
OPTIONAL MATCH (c)-->(s:square)
WITH c, s
WHERE c.id = 4 
   OR ((c)-->(s) AND s.id < 4)
RETURN c,s

Wes

S. Kai Chen

unread,
Oct 29, 2014, 1:33:59 AM10/29/14
to ne...@googlegroups.com
Yes, Wes, that solves the simple case where only 2 nodes are involved.

What about the situation where 4 or 5 nodes on the path? with a coupe of OR's?

It's a bit late for me right now.  But I will try come up with a more concrete example tomorrow.

Cheers,
Kai

Michael Hunger

unread,
Oct 29, 2014, 2:57:59 AM10/29/14
to ne...@googlegroups.com
For an optional match the path expression returns a empty collection if there is no match. So you could test for that.

You can return values by either returning the path, or by using head(extract(p in (c)-->() | last(p))) to return "s"


I still think that with your UI, the UNIONs would represent "alternative" use-cases, i.e. alternative execution-paths on the top-level, which should lend themselves nicely to an UI.

Michael

Wes Freeman

unread,
Oct 29, 2014, 3:07:57 AM10/29/14
to Neo4J
For generating cypher, it seems like stringing OR together is easier and more flexible than UNION. 

Back to your original proposal. While I can see how it might help some of your queries be simpler, I think there are other cases that might not be as desirable if we were to allow identifier binding in WHERE. That said, maybe there are some scoping rules and usage rules that would help mitigate (must have new identifiers connect to existing identifiers, and you can't use identifiers defined in WHERE in later parts of the query).

Consider this somewhat surprising example (hidden cartesian product; typo?):
MATCH (n)
WHERE NOT (m)--()
RETURN *;

Should this be allowed? Should it return both n and m, or just n (and should each n be repeated m times)?
MATCH (n)
WHERE (m)
RETURN *;

Wes

Andres Taylor

unread,
Oct 29, 2014, 6:01:07 AM10/29/14
to neo4j
Hiya!

I think there is a lot of circumstances where being able to express things like this makes sense. Michael and Wes have useful workarounds, but longer term, I'm growing more and more convinced that we need better sub query support. Something similar to SQL's EXISTS method would solve it without the oddity of having predicates introduce identifiers. Something like this would then be possible:

MATCH (a)
WHERE EXISTS (
  MATCH (a)-->(b:Label)
  WHERE a.prop = b.prop AND b.foo = "BAR"
)

This is not something we have planned for at the moment, but WDYT about this direction instead of introducing identifiers?

Cheers,

Andrés

Javier de la Rosa

unread,
Oct 29, 2014, 9:52:15 AM10/29/14
to ne...@googlegroups.com
+1 for subqueries.
Javier de la Rosa
http://versae.es

S. Kai Chen

unread,
Oct 29, 2014, 12:40:29 PM10/29/14
to ne...@googlegroups.com
Wes,

I'd say YES to both examples: both have correct syntax, though probably incorrect semantics.

In the first case where one's searching for unconnected nodes, it would return a Cartesian of the set over itself, because '*' means return all identifiers.  In the second case, it's a Cartesian of the same kind, the set of all nodes over itself.

I'd treat the identifiers in predicates exactly the same way as they appear in MATCH or OPTIONAL.  Yes, it's easier to write pathological queries, but only when you try to.  Especially if the query compiler issues a warning when the predicate pattern does not connect to any previously defined graphs.

S. Kai Chen

unread,
Oct 29, 2014, 12:49:06 PM10/29/14
to ne...@googlegroups.com
Hiya Andres,

I love subqueries in SQL.  And in the particular scenario that I'm dealing with, it's certainly going to work, because I can wrap the predicate inside an EXISTS whereas with UNION or OPTIONAL I'd have to rearrange branches in the AST (mine, not Cyphers).  It's still a bit more verbose than the inlined identifier version, as the above query is written as

MATCH (a)
WHERE (a)-->(b:Label) AND a.prop = b.prop AND b.foo = "BAR"

Here now, the whole graph portion have gone into the WHERE clause.  Incidentally that's what went through my head when I first started writing Cypher queries about a year ago: why don't we just treat the pattern as a predicate? :)

Reply all
Reply to author
Forward
0 new messages