Multiple COLLECT, loses data

12 views
Skip to first unread message

Bastien Debret

unread,
Oct 16, 2016, 7:14:22 PM10/16/16
to Neo4j
Hi everyone,

i'm having kind of an issue there with multiple collects, here's the query:

MATCH (account:Account)
WHERE (account.Id = "myId")
WITH account
OPTIONAL MATCH (account)-[m:MANAGES]->(:Item)-[e:EQUIPS]->(v:Vest)
WHERE m.Start < "2016-08-23T22:00:00Z"
OPTIONAL MATCH (account)-[m1:MANAGES]->(:Item)-[e1:EQUIPED]->(v1:Vest)
WHERE m1.Start < e1.Start
AND m1.Start < "2016-08-23T22:00:00Z"
AND e1.Start < "2016-08-23T22:00:00Z"
AND e1.End > "2016-07-31T22:00:00Z"
OPTIONAL MATCH (account)-[m2:MANAGED]->(:Item)-[e2:EQUIPED]->(v2:Vest)
WHERE m2.Start < e.Start
AND m2.End > e.End
AND m2.Start < "2016-08-23T22:00:00Z"
AND m2.End > "2016-07-31T22:00:00Z"
AND e2.Start < "2016-08-23T22:00:00Z"
AND e2.End > "2016-07-31T22:00:00Z"
WITH account, collect(v) + collect(v1) + collect(v2) AS vList
UNWIND vList AS v
WITH account, collect(distinct(v)) AS vDistinct
UNWIND vDistinct AS v
OPTIONAL MATCH (v)-[swim:SWIM]-(sw:Swimmer)
WHERE swim.Start < "2016-08-23T22:00:00Z"
OPTIONAL MATCH (v)-[swam:SWAM]-(sw1:Swimmer)
WHERE swam.Start < "2016-08-23T22:00:00Z"
AND swam.End > "2016-07-31T22:00:00Z"
WITH account, vDistinct, collect(sw) + collect(sw1) AS swimmerList
UNWIND swimmerList AS s
WITH account, vDistinct, collect(DISTINCT s) AS swimmerDistinct
UNWIND vDistinct AS v
OPTIONAL MATCH (v)-[:HAS__CONTRACT|HAD_CONTRACT]-(c:Contract)
WHERE (c.StartDate < "2016-08-23T22:00:00Z")
AND (c.EndDate > "2016-07-31T22:00:00Z")
WITH account, vDistinct, swimmerList, collect(DISTINCT c) AS Contract
...

The idea is:

- i start from an account
- i retrieve my Vests, from several potential paths, based on equipment period
- i collect the Vests from every path, collect them in on collection, then unwind and recollect everything filtered by distinct, because i potentially got the same Vest several times if it was equiped several time on the period
- Let's say i got 60 Vests, now i want to get things assiocated with those elements, things that can or can not be there, and add them to the vest i found them from like so:

[Vest X] -- [Swimmer X] -- []
[Vest X] -- [Swimmer X] -- [Contract X]
[Vest X] -- [] -- [Contract X]
[Vest X] -- [] -- []
[Vest X] -- [Swimmer X] -- []
[Vest X] -- [] -- [Contract X]
[Vest X] -- [Swimmer X] -- []

The problem is that with the method i use to collect from several path only distinct elements, if from my vests, i try to optional match swimmers and that there's no swimmer associated, i lose my Vest collection for a reason i can't figure out..

I profiled my query and the section that makes me lose the Vests elements is: 

WITH account, vDistinct, collect(sw) + collect(sw1) AS swimmerList
UNWIND swimmerList AS s
WITH account, vDistinct, collect(DISTINCT s) AS swimmerDistinct

Do you have any ideas why ? If i'm not explaining well enough please ask your questions ;)

Thanks guys, Bastien.

EDIT: To be more specific i created a dataset explaining my use-case:

Query to create the dataset:
CREATE (a:Account {Id:'1'})
CREATE (i1:Item {Id:'item1'})
CREATE (i2:Item {Id:'item2'})
CREATE (i3:Item {Id:'item3'})
CREATE (i4:Item {Id:'item4'})
CREATE (a)-[:MANAGES {Start:'2016-07-31T23:00:00Z'}]->(i1)
CREATE (a)-[:MANAGES {Start:'2016-07-31T23:00:00Z'}]->(i2)
CREATE (a)-[:MANAGED {Start:'2016-07-31T22:00:00Z', End:'2016-08-23T22:00:00Z'}]->(i3)
CREATE (a)-[:MANAGED {Start:'2016-07-31T22:00:00Z', End:'2016-08-23T22:00:00Z'}]->(i4)
CREATE (v:Vest {Id:'vest1'})
CREATE (v2:Vest {Id:'vest2'})
CREATE (v3:Vest {Id:'vest3'})
CREATE (v4:Vest {Id:'vest4'})
CREATE (v5:Vest {Id:'vest5'})
CREATE (v6:Vest {Id:'vest6'})
CREATE (i1)-[:EQUIPS {Start:'2016-07-31T22:00:00Z'}]->(v)
CREATE (i2)-[:EQUIPS {Start:'2016-07-31T22:00:00Z'}]->(v2)
CREATE (i2)-[:EQUIPED {Start:'2016-07-31T22:00:00Z', End:'2016-08-23T22:00:00Z'}]->(v3)
CREATE (i3)-[:EQUIPS {Start:'2016-07-31T22:00:00Z'}]->(v4)
CREATE (i3)-[:EQUIPED {Start:'2016-07-31T22:00:00Z', End:'2016-08-23T22:00:00Z'}]->(v5)
CREATE (i4)-[:EQUIPED {Start:'2016-07-31T22:00:00Z', End:'2016-08-23T22:00:00Z'}]->(v6)
CREATE (s1:Swimmer {Id:'swimmer1'})
CREATE (s2:Swimmer {Id:'swimmer2'})
CREATE (s3:Swimmer {Id:'swimmer3'})
CREATE (v)-[:SWIM {Start:'2016-07-31T22:00:00Z'}]->(s1)
CREATE (v3)-[:SWIM {Start:'2016-07-31T22:00:00Z'}]->(s2)
CREATE (v4)-[:SWIM {Start:'2016-07-31T22:00:00Z'}]->(s3)

The query that returns vests because there's swimmers associated:

MATCH (account:Account)
WHERE (account.Id = "1")
WITH account
OPTIONAL MATCH (account)-[m:MANAGES]->(:Item)-[e:EQUIPS]->(v:Vest)
WHERE m.Start < "2016-08-23T22:00:00Z"
OPTIONAL MATCH (account)-[m1:MANAGES]->(:Item)-[e1:EQUIPED]->(v1:Vest)
WHERE m1.Start < e1.Start
AND m1.Start < "2016-08-23T22:00:00Z"
AND e1.Start < "2016-08-23T22:00:00Z"
AND e1.End > "2016-07-31T22:00:00Z"
OPTIONAL MATCH (account)-[m2:MANAGED]->(:Item)-[e2:EQUIPED]->(v2:Vest)
WHERE m2.Start < e.Start
AND m2.End > e.End
AND m2.Start < "2016-08-23T22:00:00Z"
AND m2.End > "2016-07-31T22:00:00Z"
AND e2.Start < "2016-08-23T22:00:00Z"
AND e2.End > "2016-07-31T22:00:00Z"
WITH account, collect(v) + collect(v1) + collect(v2) AS vList
UNWIND vList AS v
WITH account, collect(distinct(v)) AS vDistinct
UNWIND vDistinct AS v
OPTIONAL MATCH (v)-[swim:SWIM]-(sw:Swimmer)
WHERE swim.Start < "2016-08-23T22:00:00Z"
OPTIONAL MATCH (v)-[swam:SWAM]-(sw1:Swimmer)
WHERE swam.Start < "2016-08-23T22:00:00Z"
AND swam.End > "2016-07-31T22:00:00Z"
WITH account, vDistinct, collect(sw) + collect(sw1) AS swimmerList
UNWIND swimmerList AS s
WITH account, vDistinct, collect(DISTINCT s) AS swimmerDistinct
UNWIND vDistinct AS v
OPTIONAL MATCH (v)-[:HAS__CONTRACT|HAD_CONTRACT]-(c:Contract)
WHERE (c.StartDate < "2016-08-23T22:00:00Z")
AND (c.EndDate > "2016-07-31T22:00:00Z")
WITH account, vDistinct, swimmerDistinct, collect(DISTINCT c) AS Contract
RETURN account, vDistinct, swimmerDistinct, Contract

Now, let's unbind swimmers from Vests:

MATCH (v:Vest)-[r:SWIM]-(s:Swimmer)
DELETE r

If you try the previous query now, it doesn't return vests even though there's some.

Michael Hunger

unread,
Oct 16, 2016, 7:46:35 PM10/16/16
to ne...@googlegroups.com
The problem is, that your swimmerList is empty because aggregation functions don't collect null values.

As they are disconnected 

So the UNWIND results in 0 rows.

You could add a "dummy" value to cater for that.

UNWIND case when size(swimmerList) = 0 then ['none'] else swimmerList end AS s

or earlier: COLLECT(coalesce(sw,"none"))

+---------------------------------------------------------------------------+
| account         | swimmerList | vDistinct                                 |
+---------------------------------------------------------------------------+
| Node[0]{Id:"1"} | []          | [Node[6]{Id:"vest2"},Node[5]{Id:"vest1"}] |
+---------------------------------------------------------------------------+
1 row

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages