Optimizing a query

9 views

Skip to first unread message

Sunit Jain

unread,

Jul 10, 2015, 8:53:16 AM7/10/15

to neo4j-...@googlegroups.com

Here is what I have:

(PROTEIN) - [BELONGS_TO] -> (CLUSTER)
(PROTEIN) - [FOUND_IN]   -> (ORGANISM)
(PROTEIN) - [CONTAINS]   -> (PFAM)

Here is what I'm trying to do:

Given organisms with a property, find the common PFams.

Here is the query I came up with:

MATCH (o1:ORGANISM{gene_A:"Yes"})<--(p1:PROTEIN)-->(pf:PFAM)<--(p2:PROTEIN)-->(o2:ORGANISM{gene_B:"Yes"}) WHERE o1.name <> o2.name RETURN DISTINCT pf.name,pf.description, count(DISTINCT o1.name) as Weight ORDER BY Weight DESC;

However, this query takes ~80000ms to complete! Am I doing something wrong? How can I optimize this query?

Some more details about the database:

# Nodes = 164 565

# Properties = 525 025

# Relations = 389 695

System RAM: 512GB

SCHEMA
==> Indexes
==>   ON :CLUSTER(name)  ONLINE  
==>   ON :ORGANISM(name) ONLINE  
==>   ON :PFAM(name)     ONLINE  
==>   ON :PROTEIN(name)  ONLINE  
==> 
==> No constraints

Thanks

Sunit

PS: Would you like me to post questions like these to the larger Neo4j group?

Reply all

Reply to author

Forward

0 new messages