Research Area 1 -> Research Area B -> Research Area X -> Product A -> Sales 123 Research Area 1 -> Research Area B -> Product B -> Sales 456 Research Area 1 -> Research Area C -> Product Z -> Sales 789
And so on. There are 4 levels of sub research areas.
What I want to do is get the top level research areas and the 2nd from top research areas and show the sales of products which come underneath them.
So the final result would be something like this:
Research Area 1 -- Research Area B 579 -- Research Area C 789
I've indexed the top level research areas to speed up that lookup but I'm struggling to get the rest of the query to run quickly but it seems like it should be possible so I must be doing something wrong. This is what I have:
private Map<String, BigDecimal> calculateTotalSales(List<Node> subResearchAreas) { final Map<String, BigDecimal> totalSales = new HashMap<String, BigDecimal>(); org.neo4j.graphdb.traversal.Traverser traverse = Traversal.description() .depthFirst()
This traversal gets run 34 times since there are 34 top level research areas and it gets called inside a loop. Overall this bit of the code takes about 8.5 seconds to run so that's around 0.3 seconds per traversal.
If there's a better way to solve this problem and I'm going about it totally the wrong way please let me know as well! I've tried changing the code to not use the traversal API and just do the relationship lookups directly on the nodes inside for loops but that actually takes longer than using the traversal API. I also tried pushing the 'primary_research_area' and 'sold' relationships into the 'relationships' method call on the API but that made it slower as well.
Mark,
how big is the dataset, how many pathes are you touching (just count
the invocations of the Evaluators)? Have the database somewhere? I am
vaguely thinking that from your code, "looking ahead" in your
evaluators might be the thing that is causing the slowdown. Would like
to test it with some data.
On Fri, Oct 5, 2012 at 11:20 PM, Mark Needham <m.h.need...@gmail.com> wrote:
> Hey,
> I have a graph which is modelled like this:
> Research Area 1 -> Research Area B -> Research Area X -> Product A -> Sales
> 123
> Research Area 1 -> Research Area B -> Product B -> Sales 456
> Research Area 1 -> Research Area C -> Product Z -> Sales 789
> And so on. There are 4 levels of sub research areas.
> What I want to do is get the top level research areas and the 2nd from top
> research areas and show the sales of products which come underneath them.
> So the final result would be something like this:
> Research Area 1
> -- Research Area B 579
> -- Research Area C 789
> I've indexed the top level research areas to speed up that lookup but I'm
> struggling to get the rest of the query to run quickly but it seems like it
> should be possible so I must be doing something wrong. This is what I have:
> This traversal gets run 34 times since there are 34 top level research areas
> and it gets called inside a loop. Overall this bit of the code takes about
> 8.5 seconds to run so that's around 0.3 seconds per traversal.
> If there's a better way to solve this problem and I'm going about it totally
> the wrong way please let me know as well! I've tried changing the code to
> not use the traversal API and just do the relationship lookups directly on
> the nodes inside for loops but that actually takes longer than using the
> traversal API. I also tried pushing the 'primary_research_area' and 'sold'
> relationships into the 'relationships' method call on the API but that made
> it slower as well.
I put a count into that custom evaluator that I've got and then printed out its value just before the return line of the method and this is what it printed out:
I'm sure I also tried passing in all the subResearchAreas in one go instead of doing 34 calls to the traversal API but that didn't have any impact on the running time.
On Saturday, 6 October 2012 04:27:16 UTC+1, Peter Neubauer wrote:
> Mark, > how big is the dataset, how many pathes are you touching (just count > the invocations of the Evaluators)? Have the database somewhere? I am > vaguely thinking that from your code, "looking ahead" in your > evaluators might be the thing that is causing the slowdown. Would like > to test it with some data.
> On Fri, Oct 5, 2012 at 11:20 PM, Mark Needham <m.h.n...@gmail.com<javascript:>> > wrote: > > Hey,
> > I have a graph which is modelled like this:
> > Research Area 1 -> Research Area B -> Research Area X -> Product A -> > Sales > > 123 > > Research Area 1 -> Research Area B -> Product B -> Sales 456 > > Research Area 1 -> Research Area C -> Product Z -> Sales 789
> > And so on. There are 4 levels of sub research areas.
> > What I want to do is get the top level research areas and the 2nd from > top > > research areas and show the sales of products which come underneath > them.
> > So the final result would be something like this:
> > Research Area 1 > > -- Research Area B 579 > > -- Research Area C 789
> > I've indexed the top level research areas to speed up that lookup but > I'm > > struggling to get the rest of the query to run quickly but it seems like > it > > should be possible so I must be doing something wrong. This is what I > have:
> > This traversal gets run 34 times since there are 34 top level research > areas > > and it gets called inside a loop. Overall this bit of the code takes > about > > 8.5 seconds to run so that's around 0.3 seconds per traversal.
> > If there's a better way to solve this problem and I'm going about it > totally > > the wrong way please let me know as well! I've tried changing the code > to > > not use the traversal API and just do the relationship lookups directly > on > > the nodes inside for loops but that actually takes longer than using the > > traversal API. I also tried pushing the 'primary_research_area' and > 'sold' > > relationships into the 'relationships' method call on the API but that > made > > it slower as well.
> Research Area 1 -> Research Area B -> Research Area X -> Product A -> Sales 123
> Research Area 1 -> Research Area B -> Product B -> Sales 456
> Research Area 1 -> Research Area C -> Product Z -> Sales 789
> And so on. There are 4 levels of sub research areas.
> What I want to do is get the top level research areas and the 2nd from top research areas and show the sales of products which come underneath them.
> So the final result would be something like this:
> Research Area 1
> -- Research Area B 579
> -- Research Area C 789
> I've indexed the top level research areas to speed up that lookup but I'm struggling to get the rest of the query to run quickly but it seems like it should be possible so I must be doing something wrong. This is what I have:
> This traversal gets run 34 times since there are 34 top level research areas and it gets called inside a loop. Overall this bit of the code takes about 8.5 seconds to run so that's around 0.3 seconds per traversal.
> If there's a better way to solve this problem and I'm going about it totally the wrong way please let me know as well! I've tried changing the code to not use the traversal API and just do the relationship lookups directly on the nodes inside for loops but that actually takes longer than using the traversal API. I also tried pushing the 'primary_research_area' and 'sold' relationships into the 'relationships' method call on the API but that made it slower as well.
When you say profile it what do you mean? I have jvisualvm open but all I was really looking for there was whether I had the heap space large enough and the JVM wasn't running GC because it couldn't keep the whole thing in memory. I think I've now got the heap size large enough that it's fine. Is there something else/some other tool you'd suggest?
Do I achieve the same outcome as 'move rel types into an enum' by putting the DynamicRelationshipType.withName calls as fields? I tried that and didn't see any difference.
I tried "Use exclude and continue as your not interested in the nodes anyway?" and didn't see any difference.
I didn't quite understand what you meant by "Imho your traversal tries to aggregate on every level not just on the last? Check e.g for path length"
Are you saying that it runs the code in my custom evaluator even when the last node on the path doesn't have a product hanging of it? Or something else?
> "Perhaps just return the paths (research area to product to sale) from
the traversal and do the aggregation on the returned paths?"
I haven't tried this yet but will try now.
> "What about introducing a top level node and just running one big traversal? > Instead of 34?"
On Saturday, 6 October 2012 07:44:37 UTC+1, Michael Hunger wrote:
> What about introducing a top level node and just running one big traversal? > Instead of 34?
> Did you try to profile it? Is it on cold caches?
> Some ideas
> Move the rel types into an enum > Use exclude and continue as your not interested in the nodes anyway?
> Imho your traversal tries to aggregate on every level not just on the > last? Check e.g for path length
> Perhaps just return the paths (research area to product to sale) from the > traversal and do the aggregation on the returned paths?
> HTH > Michael
> Sent from mobile device
> Am 05.10.2012 um 23:20 schrieb Mark Needham <m.h.n...@gmail.com<javascript:> > >:
> Hey,
> I have a graph which is modelled like this:
> Research Area 1 -> Research Area B -> Research Area X -> Product A -> > Sales 123 > Research Area 1 -> Research Area B -> Product B -> Sales 456 > Research Area 1 -> Research Area C -> Product Z -> Sales 789
> And so on. There are 4 levels of sub research areas.
> What I want to do is get the top level research areas and the 2nd from top > research areas and show the sales of products which come underneath them.
> So the final result would be something like this:
> Research Area 1 > -- Research Area B 579 > -- Research Area C 789
> I've indexed the top level research areas to speed up that lookup but I'm > struggling to get the rest of the query to run quickly but it seems like it > should be possible so I must be doing something wrong. This is what I have:
> This traversal gets run 34 times since there are 34 top level research > areas and it gets called inside a loop. Overall this bit of the code takes > about 8.5 seconds to run so that's around 0.3 seconds per traversal.
> If there's a better way to solve this problem and I'm going about it > totally the wrong way please let me know as well! I've tried changing the > code to not use the traversal API and just do the relationship lookups > directly on the nodes inside for loops but that actually takes longer than > using the traversal API. I also tried pushing the 'primary_research_area' > and 'sold' relationships into the 'relationships' method call on the API > but that made it slower as well.
> When you say profile it what do you mean? I have jvisualvm open but all I was really looking for there was whether I had the heap space large enough and the JVM wasn't running GC because it couldn't keep the whole thing in memory. I think I've now got the heap size large enough that it's fine. Is there something else/some other tool you'd suggest?
> Do I achieve the same outcome as 'move rel types into an enum' by putting the DynamicRelationshipType.withName calls as fields? I tried that and didn't see any difference.
> I tried "Use exclude and continue as your not interested in the nodes anyway?" and didn't see any difference.
> I didn't quite understand what you meant by "Imho your traversal tries to aggregate on every level not just on the last? Check e.g for path length"
> Are you saying that it runs the code in my custom evaluator even when the last node on the path doesn't have a product hanging of it? Or something else?
> > "Perhaps just return the paths (research area to product to sale) from the traversal and do the aggregation on the returned paths?"
> I haven't tried this yet but will try now.
> > "What about introducing a top level node and just running one big traversal?
> > Instead of 34?"
> Ditto.
> On Saturday, 6 October 2012 07:44:37 UTC+1, Michael Hunger wrote:
>> What about introducing a top level node and just running one big traversal?
>> Instead of 34?
>> Did you try to profile it? Is it on cold caches?
>> Some ideas
>> Move the rel types into an enum
>> Use exclude and continue as your not interested in the nodes anyway?
>> Imho your traversal tries to aggregate on every level not just on the last? Check e.g for path length
>> Perhaps just return the paths (research area to product to sale) from the traversal and do the aggregation on the returned paths?
>> HTH
>> Michael
>> Sent from mobile device
>> Am 05.10.2012 um 23:20 schrieb Mark Needham <m.h.n...@gmail.com>:
>>> Hey,
>>> I have a graph which is modelled like this:
>>> Research Area 1 -> Research Area B -> Research Area X -> Product A -> Sales 123
>>> Research Area 1 -> Research Area B -> Product B -> Sales 456
>>> Research Area 1 -> Research Area C -> Product Z -> Sales 789
>>> And so on. There are 4 levels of sub research areas.
>>> What I want to do is get the top level research areas and the 2nd from top research areas and show the sales of products which come underneath them.
>>> So the final result would be something like this:
>>> Research Area 1
>>> -- Research Area B 579
>>> -- Research Area C 789
>>> I've indexed the top level research areas to speed up that lookup but I'm struggling to get the rest of the query to run quickly but it seems like it should be possible so I must be doing something wrong. This is what I have:
>>> This traversal gets run 34 times since there are 34 top level research areas and it gets called inside a loop. Overall this bit of the code takes about 8.5 seconds to run so that's around 0.3 seconds per traversal.
>>> If there's a better way to solve this problem and I'm going about it totally the wrong way please let me know as well! I've tried changing the code to not use the traversal API and just do the relationship lookups directly on the nodes inside for loops but that actually takes longer than using the traversal API. I also tried pushing the 'primary_research_area' and 'sold' relationships into the 'relationships' method call on the API but that made it slower as well.
BigDecimal monthlySales = new BigDecimal((Double) sales.getProperty("sales")); String name = (String) path.startNode().getProperty("display_name"); if (totalSales.containsKey(name)) { totalSales.put(name, totalSales.get(name).add(monthlySales)); } else { totalSales.put(name, monthlySales); } }
return totalSales; }
Is that what you had in mind?
It takes 23 seconds to execute that vs about 10 seconds for the original version. I changed the depth of the search to 6 because it now needs to also traverse the 'primary_research_area' and 'sold' edges.
> The evaluator is called for every node in the traversal (imho)
> But you want to run the code only for the leaf nodes
> Sent from mobile device
> Am 06.10.2012 um 09:32 schrieb Mark Needham <m.h.n...@gmail.com<javascript:> > >:
> When you say profile it what do you mean? I have jvisualvm open but all I > was really looking for there was whether I had the heap space large enough > and the JVM wasn't running GC because it couldn't keep the whole thing in > memory. I think I've now got the heap size large enough that it's fine. Is > there something else/some other tool you'd suggest?
> Do I achieve the same outcome as 'move rel types into an enum' by putting > the DynamicRelationshipType.withName calls as fields? I tried that and > didn't see any difference.
> I tried "Use exclude and continue as your not interested in the nodes > anyway?" and didn't see any difference.
> I didn't quite understand what you meant by "Imho your traversal tries to > aggregate on every level not just on the last? Check e.g for path length"
> Are you saying that it runs the code in my custom evaluator even when the > last node on the path doesn't have a product hanging of it? Or something > else?
> > "Perhaps just return the paths (research area to product to sale) from > the traversal and do the aggregation on the returned paths?"
> I haven't tried this yet but will try now.
> > "What about introducing a top level node and just running one big > traversal? > > Instead of 34?"
> Ditto.
> On Saturday, 6 October 2012 07:44:37 UTC+1, Michael Hunger wrote:
>> What about introducing a top level node and just running one big >> traversal? >> Instead of 34?
>> Did you try to profile it? Is it on cold caches?
>> Some ideas
>> Move the rel types into an enum >> Use exclude and continue as your not interested in the nodes anyway?
>> Imho your traversal tries to aggregate on every level not just on the >> last? Check e.g for path length
>> Perhaps just return the paths (research area to product to sale) from the >> traversal and do the aggregation on the returned paths?
>> HTH >> Michael
>> Sent from mobile device
>> Am 05.10.2012 um 23:20 schrieb Mark Needham <m.h.n...@gmail.com>:
>> Hey,
>> I have a graph which is modelled like this:
>> Research Area 1 -> Research Area B -> Research Area X -> Product A -> >> Sales 123 >> Research Area 1 -> Research Area B -> Product B -> Sales 456 >> Research Area 1 -> Research Area C -> Product Z -> Sales 789
>> And so on. There are 4 levels of sub research areas.
>> What I want to do is get the top level research areas and the 2nd from >> top research areas and show the sales of products which come underneath >> them.
>> So the final result would be something like this:
>> Research Area 1 >> -- Research Area B 579 >> -- Research Area C 789
>> I've indexed the top level research areas to speed up that lookup but I'm >> struggling to get the rest of the query to run quickly but it seems like it >> should be possible so I must be doing something wrong. This is what I have:
>> This traversal gets run 34 times since there are 34 top level research >> areas and it gets called inside a loop. Overall this bit of the code takes >> about 8.5 seconds to run so that's around 0.3 seconds per traversal.
>> If there's a better way to solve this problem and I'm going about it >> totally the wrong way please let me know as well! I've tried changing the >> code to not use the traversal API and just do the relationship lookups >> directly on the nodes inside for loops but that actually takes longer than >> using the traversal API. I also tried pushing the 'primary_research_area' >> and 'sold' relationships into the 'relationships' method call on the API >> but that made it slower as well.
> It takes 23 seconds to execute that vs about 10 seconds for the original version. I changed the depth of the search to 6 because it now needs to also traverse the 'primary_research_area' and 'sold' edges.
> On Saturday, 6 October 2012 08:42:46 UTC+1, Michael Hunger wrote:
>> The evaluator is called for every node in the traversal (imho)
>> But you want to run the code only for the leaf nodes
>> Sent from mobile device
>> Am 06.10.2012 um 09:32 schrieb Mark Needham <m.h.n...@gmail.com>:
>>> When you say profile it what do you mean? I have jvisualvm open but all I was really looking for there was whether I had the heap space large enough and the JVM wasn't running GC because it couldn't keep the whole thing in memory. I think I've now got the heap size large enough that it's fine. Is there something else/some other tool you'd suggest?
>>> Do I achieve the same outcome as 'move rel types into an enum' by putting the DynamicRelationshipType.withName calls as fields? I tried that and didn't see any difference.
>>> I tried "Use exclude and continue as your not interested in the nodes anyway?" and didn't see any difference.
>>> I didn't quite understand what you meant by "Imho your traversal tries to aggregate on every level not just on the last? Check e.g for path length"
>>> Are you saying that it runs the code in my custom evaluator even when the last node on the path doesn't have a product hanging of it? Or something else?
>>> > "Perhaps just return the paths (research area to product to sale) from the traversal and do the aggregation on the returned paths?"
>>> I haven't tried this yet but will try now.
>>> > "What about introducing a top level node and just running one big traversal?
>>> > Instead of 34?"
>>> Ditto.
>>> On Saturday, 6 October 2012 07:44:37 UTC+1, Michael Hunger wrote:
>>>> What about introducing a top level node and just running one big traversal?
>>>> Instead of 34?
>>>> Did you try to profile it? Is it on cold caches?
>>>> Some ideas
>>>> Move the rel types into an enum
>>>> Use exclude and continue as your not interested in the nodes anyway?
>>>> Imho your traversal tries to aggregate on every level not just on the last? Check e.g for path length
>>>> Perhaps just return the paths (research area to product to sale) from the traversal and do the aggregation on the returned paths?
>>>> HTH
>>>> Michael
>>>> Sent from mobile device
>>>> Am 05.10.2012 um 23:20 schrieb Mark Needham <m.h.n...@gmail.com>:
>>>>> Hey,
>>>>> I have a graph which is modelled like this:
>>>>> Research Area 1 -> Research Area B -> Research Area X -> Product A -> Sales 123
>>>>> Research Area 1 -> Research Area B -> Product B -> Sales 456
>>>>> Research Area 1 -> Research Area C -> Product Z -> Sales 789
>>>>> And so on. There are 4 levels of sub research areas.
>>>>> What I want to do is get the top level research areas and the 2nd from top research areas and show the sales of products which come underneath them.
>>>>> So the final result would be something like this:
>>>>> Research Area 1
>>>>> -- Research Area B 579
>>>>> -- Research Area C 789
>>>>> I've indexed the top level research areas to speed up that lookup but I'm struggling to get the rest of the query to run quickly but it seems like it should be possible so I must be doing something wrong. This is what I have:
>>>>> This traversal gets run 34 times since there are 34 top level research areas and it gets called inside a loop. Overall this bit of the code takes about 8.5 seconds to run so that's around 0.3 seconds per traversal.
>>>>> If there's a better way to solve this problem and I'm going about it totally the wrong way please let me know as well! I've tried changing the code to not use the traversal API and just do the relationship lookups directly on the nodes inside for loops but that actually takes longer than using the traversal API. I also tried pushing the 'primary_research_area' and 'sold' relationships into the 'relationships' method call on the API but that made it slower as well.
There are 11,000 calls to org.neo4j.kernel.impl.traversal.TraverserIterator.fetchNextOrNull() and then 9,000 calls to org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships() - I'm not sure exactly what I should be looking for to try and work out what's causing it to run so slowly though?
> It takes 23 seconds to execute that vs about 10 seconds for the original > version. I changed the depth of the search to 6 because it now needs to > also traverse the 'primary_research_area' and 'sold' edges.
> On Saturday, 6 October 2012 08:42:46 UTC+1, Michael Hunger wrote:
>> The evaluator is called for every node in the traversal (imho)
>> But you want to run the code only for the leaf nodes
>> Sent from mobile device
>> Am 06.10.2012 um 09:32 schrieb Mark Needham <m.h.n...@gmail.com>:
>> When you say profile it what do you mean? I have jvisualvm open but all I >> was really looking for there was whether I had the heap space large enough >> and the JVM wasn't running GC because it couldn't keep the whole thing in >> memory. I think I've now got the heap size large enough that it's fine. Is >> there something else/some other tool you'd suggest?
>> Do I achieve the same outcome as 'move rel types into an enum' by >> putting the DynamicRelationshipType.withName calls as fields? I tried that >> and didn't see any difference.
>> I tried "Use exclude and continue as your not interested in the nodes >> anyway?" and didn't see any difference.
>> I didn't quite understand what you meant by "Imho your traversal tries to >> aggregate on every level not just on the last? Check e.g for path length"
>> Are you saying that it runs the code in my custom evaluator even when the >> last node on the path doesn't have a product hanging of it? Or something >> else?
>> > "Perhaps just return the paths (research area to product to sale) from >> the traversal and do the aggregation on the returned paths?"
>> I haven't tried this yet but will try now.
>> > "What about introducing a top level node and just running one big >> traversal? >> > Instead of 34?"
>> Ditto.
>> On Saturday, 6 October 2012 07:44:37 UTC+1, Michael Hunger wrote:
>>> What about introducing a top level node and just running one big >>> traversal? >>> Instead of 34?
>>> Did you try to profile it? Is it on cold caches?
>>> Some ideas
>>> Move the rel types into an enum >>> Use exclude and continue as your not interested in the nodes anyway?
>>> Imho your traversal tries to aggregate on every level not just on the >>> last? Check e.g for path length
>>> Perhaps just return the paths (research area to product to sale) from >>> the traversal and do the aggregation on the returned paths?
>>> HTH >>> Michael
>>> Sent from mobile device
>>> Am 05.10.2012 um 23:20 schrieb Mark Needham <m.h.n...@gmail.com>:
>>> Hey,
>>> I have a graph which is modelled like this:
>>> Research Area 1 -> Research Area B -> Research Area X -> Product A -> >>> Sales 123 >>> Research Area 1 -> Research Area B -> Product B -> Sales 456 >>> Research Area 1 -> Research Area C -> Product Z -> Sales 789
>>> And so on. There are 4 levels of sub research areas.
>>> What I want to do is get the top level research areas and the 2nd from >>> top research areas and show the sales of products which come underneath >>> them.
>>> So the final result would be something like this:
>>> Research Area 1 >>> -- Research Area B 579 >>> -- Research Area C 789
>>> I've indexed the top level research areas to speed up that lookup but >>> I'm struggling to get the rest of the query to run quickly but it seems >>> like it should be possible so I must be doing something wrong. This is what >>> I have:
>>> This traversal gets run 34 times since there are 34 top level research >>> areas and it gets called inside a loop. Overall this bit of the code takes >>> about 8.5 seconds to run so that's around 0.3 seconds per traversal.
>>> If there's a better way to solve this problem and I'm going about it >>> totally the wrong way please let me know as well! I've tried changing the >>> code to not use the traversal API and just do the relationship lookups >>> directly on the nodes inside for loops but that actually takes longer than >>> using the traversal API. I also tried pushing the 'primary_research_area' >>> and 'sold' relationships into the 'relationships' method call on the API >>> but that made it slower as well.
On Saturday, 6 October 2012 13:33:18 UTC+1, Mark Needham wrote:
> I tried profiling what was going on with yourkit:
> There are 11,000 calls to > org.neo4j.kernel.impl.traversal.TraverserIterator.fetchNextOrNull() and > then 9,000 calls to > org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships() - > I'm not sure exactly what I should be looking for to try and work out > what's causing it to run so slowly though?
> On Saturday, 6 October 2012 10:02:38 UTC+1, Michael Hunger wrote:
>> Interesting. >> Do you have a dataset/generator?
# try to use cache_type strong db = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(path).setConfig(GraphData baseSettings.cache_type, GraphDatabaseSettings.CacheTypeSetting.strong).newGraphDatabase();
# the bigdecimal addition takes quite a bit of time (perhaps it is possible to store the sales as whole number (money * 100)
The traversal framework seems to add a bit of overhead, esp with multiple given rel-types (I tried a custom-rel-expander but that didn't perform much better (esp. b/c of your variable length path in between)
I tried to connect the top-level research areas to a root node and it worked out quite well, so if you change that part it might help too.
Am 06.10.2012 um 14:42 schrieb Mark Needham <m.h.need...@gmail.com>:
> Attaching a snapshot of the yourkit trace - that might be more useful than my summary.
> On Saturday, 6 October 2012 13:33:18 UTC+1, Mark Needham wrote:
> I tried profiling what was going on with yourkit:
> There are 11,000 calls to org.neo4j.kernel.impl.traversal.TraverserIterator.fetchNextOrNull() and then 9,000 calls to org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships() - I'm not sure exactly what I should be looking for to try and work out what's causing it to run so slowly though?
> On Saturday, 6 October 2012 10:02:38 UTC+1, Michael Hunger wrote:
> Interesting.
> Do you have a dataset/generator?
I like your second approach better. A couple of things:
1) You could probably have just one evaluator:
Evaluators.lastRelationshipIs( INCLUDE_AND_PRUNE, EXCLUDE_AND_CONTINUE,
sold );
this way you tell the traversal to not even try to expand the product
nodes. Will probably spare the traversal lots of getRelationships calls.
2) For every returned path, just key on the Node and when you got it all
"convert" the key into the name by doing the getProperty on them.
3) Can a product be sold by many research areas? In that case you'd
probably need uniqueness level NODE_PATH
4) Counts could instead be kept as a property per research levels, just the
number of directly related products so that the traversal can just traverse
through the research areas and accumulate that count.
2012/10/7 Michael Hunger <michael.hun...@neotechnology.com>
> # try to use cache_type strong
> db = new
> GraphDatabaseFactory().newEmbeddedDatabaseBuilder(path).setConfig(GraphData baseSettings.cache_type,
> GraphDatabaseSettings.CacheTypeSetting.strong).newGraphDatabase();
> # the bigdecimal addition takes quite a bit of time (perhaps it is
> possible to store the sales as whole number (money * 100)
> The traversal framework seems to add a bit of overhead, esp with multiple
> given rel-types (I tried a custom-rel-expander but that didn't perform much
> better (esp. b/c of your variable length path in between)
> I tried to connect the top-level research areas to a root node and it
> worked out quite well, so if you change that part it might help too.
> Am 06.10.2012 um 14:42 schrieb Mark Needham <m.h.need...@gmail.com>:
> Attaching a snapshot of the yourkit trace - that might be more useful than
> my summary.
> On Saturday, 6 October 2012 13:33:18 UTC+1, Mark Needham wrote:
>> I tried profiling what was going on with yourkit:
>> There are 11,000 calls to org.neo4j.kernel.impl.**
>> traversal.TraverserIterator.**fetchNextOrNull() and then 9,000 calls to
>> org.neo4j.kernel.impl.**traversal.TraversalBranchImpl.**expandRelationships ()
>> - I'm not sure exactly what I should be looking for to try and work out
>> what's causing it to run so slowly though?
>> On Saturday, 6 October 2012 10:02:38 UTC+1, Michael Hunger wrote:
>>> Interesting.
>>> Do you have a dataset/generator?