:Protein ---> :is_a ---------------> :Enzyme ---> :activated_by|inhibited_by ---> :Compound
\<-- :activated_by <---/
:Compound --> :consumed_by|:produced_by ---> :Transport
:Transport --> :catalyzed_by :Enzyme
:Transport --> :part_of ---> :Pathway
(yes it's biology, yes, it's from BioPAX). I want to write a query that returns pairs of (Protein, Pathway), by going through all possible paths/subgraphs. Moreover, I would like to write it in a compact form, like in SQL or SPARQL, that is, in informal terms:
select distinct protein, pathway {
(
protein is_a -> enzyme
union protein <- activated_by enzyme
)
// I expect matching enzymes to be joned downstream
(
( enzyme activated_by|inhibited_by -> compound - consumed_by | produced_by -> transport )
union ( transport <- catalyzed_by enzyme )
) // I expect tuples of protein/enzyme/transport, where transport is obtained from the two sub-branches
transport - part_of -> pathway
}
Is it possible to do it in Cypher? I mean, by being explicit, I already know that I could simplify it a lot by writing something like ( protein:Protein ) -- ( pathway:Pathway), but suppose you need to explicitly select those paths and not others that might be there. The general question is: is it possible to nest subqueries of any complexity? And in particular, is it possible to write an union-based query, which contains further union-based subqueries? Is there a simple way to do so?
I tried to follow the approach suggested by the documentation and suggested by the answers in that StackOverflow link mentioned above, by using WITH + COLLECT() + UNWIND. However, I either obtain a syntax error or a query plan where I can see things are not joined as I expect. For instance, this query:
PROFILE MATCH (prot:Protein) - [:is_a] -> (enz:Enzyme)
WITH prot, COLLECT ( enz ) AS enzs
UNWIND enzs AS enz
MATCH (enz) - [:ac_by|:in_by] -> (comp:Comp)
WITH prot, enzs, COLLECT (comp) AS comps
UNWIND enzs AS enz
MATCH (enz) <- [:ca_by] - (tns:Transport)
WITH prot, comps, COLLECT (tns) AS tnss
UNWIND comps AS comp
MATCH (tns:Transport) <- [:cs_by|:pd_by] - (comp)
WITH prot, tnss + COLLECT ( tns ) AS tnss
UNWIND tnss AS tns
MATCH (tns) - [:part_of] -> (path:Path)
RETURN prot, path LIMIT 25
shows me a linear plan, where, at some point, there are 0 resulting rows (and I'm very sure that's wrong, single-path queries between protein/pathway return me a few nodes). Even if it worked, frankly, I don't find it the easiest way to express this graph pattern, for the nesting approach shown above and supported by other languages (I've already tried the same with SPARQL) is quite easier to write and read.
Thanks in advance for any help.