And so on. There are 4 levels of sub research areas.
What I want to do is get the top level research areas and the 2nd from top research areas and show the sales of products which come underneath them.
I've indexed the top level research areas to speed up that lookup but I'm struggling to get the rest of the query to run quickly but it seems like it should be possible so I must be doing something wrong. This is what I have:
private Map<String, BigDecimal> calculateTotalSales(List<Node> subResearchAreas) {
final Map<String, BigDecimal> totalSales = new HashMap<String, BigDecimal>();
org.neo4j.graphdb.traversal.Traverser traverse = Traversal.description()
.depthFirst()
.relationships(DynamicRelationshipType.withName("has_child"), Direction.OUTGOING)
.evaluator(Evaluators.toDepth(4))
.evaluator(new Evaluator() {
public Evaluation evaluate(Path path) {
if(path.length() == 0){
return Evaluation.EXCLUDE_AND_CONTINUE;
}
Node subResearchArea = path.endNode();
Iterable<Relationship> researchAreaToProducts = subResearchArea.getRelationships(DynamicRelationshipType.withName("primary_research_area"));
for (Relationship researchAreaToProduct : researchAreaToProducts) {
Node product = researchAreaToProduct.getEndNode();
Iterable<Relationship> productsToSales = product.getRelationships(DynamicRelationshipType.withName("sold"));
for (Relationship productToSale : productsToSales) {
Node sales = productToSale.getEndNode();
BigDecimal monthlySales = new BigDecimal((Double) sales.getProperty("sales"));
String name = (String) path.startNode().getProperty("display_name");
if (totalSales.containsKey(name)) {
totalSales.put(name, totalSales.get(name).add(monthlySales));
} else {
totalSales.put(name, monthlySales);
}
}
}
return Evaluation.INCLUDE_AND_CONTINUE;
}
})
.traverse(subResearchAreas.toArray(new Node[]{}));
for (Path forceEvaluation : traverse) { }
return totalSales;
}
This traversal gets run 34 times since there are 34 top level research areas and it gets called inside a loop. Overall this bit of the code takes about 8.5 seconds to run so that's around 0.3 seconds per traversal.
If there's a better way to solve this problem and I'm going about it totally the wrong way please let me know as well! I've tried changing the code to not use the traversal API and just do the relationship lookups directly on the nodes inside for loops but that actually takes longer than using the traversal API. I also tried pushing the 'primary_research_area' and 'sold' relationships into the 'relationships' method call on the API but that made it slower as well.