Optimizing a query

9 views
Skip to first unread message

Sunit Jain

unread,
Jul 10, 2015, 8:53:16 AM7/10/15
to neo4j-...@googlegroups.com
Hi

Here is what I have:
(PROTEIN) - [BELONGS_TO] -> (CLUSTER)
(PROTEIN) - [FOUND_IN]   -> (ORGANISM)
(PROTEIN) - [CONTAINS]   -> (PFAM)

Here is what I'm trying to do:
Given organisms with a property, find the common PFams.

Here is the query I came up with:
MATCH (o1:ORGANISM{gene_A:"Yes"})<--(p1:PROTEIN)-->(pf:PFAM)<--(p2:PROTEIN)-->(o2:ORGANISM{gene_B:"Yes"}) WHERE o1.name <> o2.name RETURN DISTINCT pf.name,pf.description, count(DISTINCT o1.name) as Weight ORDER BY Weight DESC;

However, this query takes ~80000ms to complete! Am I doing something wrong? How can I optimize this query?

Some more details about the database:
# Nodes = 164 565
# Properties = 525 025
# Relations = 389 695 
System RAM: 512GB

SCHEMA
==> Indexes
==>   ON :CLUSTER(name)  ONLINE  
==>   ON :ORGANISM(name) ONLINE  
==>   ON :PFAM(name)     ONLINE  
==>   ON :PROTEIN(name)  ONLINE  
==>
==> No constraints

Thanks
Sunit

PS: Would you like me to post questions like these to the larger Neo4j group?
Reply all
Reply to author
Forward
0 new messages