Hi
Here is what I have:
(PROTEIN) - [BELONGS_TO] -> (CLUSTER)
(PROTEIN) - [FOUND_IN] -> (ORGANISM)
(PROTEIN) - [CONTAINS] -> (PFAM)
Here is what I'm trying to do:
Given organisms with a property, find the common PFams.
Here is the query I came up with:
MATCH (o1:ORGANISM{gene_A:"Yes"})<--(p1:PROTEIN)-->(pf:PFAM)<--(p2:PROTEIN)-->(o2:ORGANISM{gene_B:"Yes"}) WHERE o1.name <> o2.name RETURN DISTINCT pf.name,pf.description, count(DISTINCT o1.name) as Weight ORDER BY Weight DESC;
However, this query takes ~80000ms to complete! Am I doing something wrong? How can I optimize this query?
Some more details about the database:
# Nodes = 164 565
# Properties = 525 025
# Relations = 389 695
System RAM: 512GB
SCHEMA
==> Indexes
==> ON :CLUSTER(name) ONLINE
==> ON :ORGANISM(name) ONLINE
==> ON :PFAM(name) ONLINE
==> ON :PROTEIN(name) ONLINE
==>
==> No constraints
Thanks
Sunit
PS: Would you like me to post questions like these to the larger Neo4j group?