Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Speeding up the performance of a query on the traversal API
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Mark Needham  
View profile  
 More options Oct 5 2012, 5:20 pm
From: Mark Needham <m.h.need...@gmail.com>
Date: Fri, 5 Oct 2012 14:20:42 -0700 (PDT)
Local: Fri, Oct 5 2012 5:20 pm
Subject: Speeding up the performance of a query on the traversal API

Hey,

I have a graph which is modelled like this:

Research Area 1 -> Research Area B -> Research Area X -> Product A -> Sales
123
Research Area 1 -> Research Area B -> Product B -> Sales 456
Research Area 1 -> Research Area C -> Product Z -> Sales 789

And so on. There are 4 levels of sub research areas.

What I want to do is get the top level research areas and the 2nd from top
research areas and show the sales of products which come underneath them.

So the final result would be something like this:

Research Area 1
-- Research Area B 579
-- Research Area C 789

I've indexed the top level research areas to speed up that lookup but I'm
struggling to get the rest of the query to run quickly but it seems like it
should be possible so I must be doing something wrong. This is what I have:

    private Map<String, BigDecimal> calculateTotalSales(List<Node>
subResearchAreas) {
        final Map<String, BigDecimal> totalSales = new HashMap<String,
BigDecimal>();
        org.neo4j.graphdb.traversal.Traverser traverse =
Traversal.description()
                .depthFirst()

.relationships(DynamicRelationshipType.withName("has_child"),
Direction.OUTGOING)
                .evaluator(Evaluators.toDepth(4))
                .evaluator(new Evaluator() {
                    public Evaluation evaluate(Path path) {
                        if(path.length() == 0){
                            return Evaluation.EXCLUDE_AND_CONTINUE;
                        }

                        Node subResearchArea = path.endNode();
                        Iterable<Relationship> researchAreaToProducts =
subResearchArea.getRelationships(DynamicRelationshipType.withName("primary_ research_area"));
                        for (Relationship researchAreaToProduct :
researchAreaToProducts) {
                            Node product =
researchAreaToProduct.getEndNode();
                            Iterable<Relationship> productsToSales =
product.getRelationships(DynamicRelationshipType.withName("sold"));
                            for (Relationship productToSale :
productsToSales) {
                                Node sales = productToSale.getEndNode();
                                BigDecimal monthlySales = new
BigDecimal((Double) sales.getProperty("sales"));
                                String name = (String)
path.startNode().getProperty("display_name");
                                if (totalSales.containsKey(name)) {
                                    totalSales.put(name,
totalSales.get(name).add(monthlySales));
                                } else {
                                    totalSales.put(name, monthlySales);
                                }
                            }
                        }

                        return Evaluation.INCLUDE_AND_CONTINUE;
                    }
                })
                .traverse(subResearchAreas.toArray(new Node[]{}));
        for (Path forceEvaluation : traverse) { }
        return totalSales;
    }

This traversal gets run 34 times since there are 34 top level research
areas and it gets called inside a loop. Overall this bit of the code takes
about 8.5 seconds to run so that's around 0.3 seconds per traversal.

If there's a better way to solve this problem and I'm going about it
totally the wrong way please let me know as well! I've tried changing the
code to not use the traversal API and just do the relationship lookups
directly on the nodes inside for loops but that actually takes longer than
using the traversal API. I also tried pushing the 'primary_research_area'
and 'sold' relationships into the 'relationships' method call on the API
but that made it slower as well.

Thanks in advance,
Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Oct 5 2012, 11:27 pm
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Sat, 6 Oct 2012 05:26:52 +0200
Local: Fri, Oct 5 2012 11:26 pm
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API
Mark,
how big is the dataset, how many pathes are you touching (just count
the invocations of the Evaluators)? Have the database somewhere? I am
vaguely thinking that from your code, "looking ahead" in your
evaluators might be the thing that is causing the slowdown. Would like
to test it with some data.

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Needham  
View profile  
 More options Oct 6 2012, 1:39 am
From: Mark Needham <m.h.need...@gmail.com>
Date: Fri, 5 Oct 2012 22:39:19 -0700 (PDT)
Local: Sat, Oct 6 2012 1:39 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

Peter,

I put a count into that custom evaluator that I've got and then printed out
its value just before the return line of the method and this is what it
printed out:

invocations = 113
invocations = 166
invocations = 75
invocations = 270
invocations = 81
invocations = 142
invocations = 146
invocations = 19
invocations = 2
invocations = 2458
invocations = 188
invocations = 681
invocations = 164
invocations = 4
invocations = 3
invocations = 97
invocations = 13
invocations = 4
invocations = 1
invocations = 1
invocations = 34
invocations = 2839
invocations = 1
invocations = 52
invocations = 2
invocations = 1
invocations = 1
invocations = 1
invocations = 1
invocations = 1
invocations = 1
invocations = 1
invocations = 1
invocations = 1

I did try putting the 'primary_research_area' and 'sold' traversals as
separate lines on the call to the traversal API i.e.

org.neo4j.graphdb.traversal.Traverser traverse = Traversal.description()
                .depthFirst()

.relationships(DynamicRelationshipType.withName("has_child"),
Direction.OUTGOING)

.relationships(DynamicRelationshipType.withName("primary_research_area"),
Direction.OUTGOING)
                .relationships(DynamicRelationshipType.withName("sold"),
Direction.OUTGOING)
....

But that actually seemed to make things slower.

I'm sure I also tried passing in all the subResearchAreas in one go instead
of doing 34 calls to the traversal API but that didn't have any impact on
the running time.

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 6 2012, 2:44 am
From: Michael Hunger <michael.hun...@neopersistence.com>
Date: Sat, 6 Oct 2012 08:44:22 +0200
Local: Sat, Oct 6 2012 2:44 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

What about introducing a top level node and just running one big traversal?
Instead of 34?

Did you try to profile it? Is it on cold caches?

Some ideas

Move the rel types into an enum
Use exclude and continue as your not interested in the nodes anyway?

Imho your traversal tries to aggregate on every level not just on the last? Check e.g for path length

Perhaps just return the paths (research area to product to sale) from the traversal and do the aggregation on the returned paths?

HTH
Michael

Sent from mobile device

Am 05.10.2012 um 23:20 schrieb Mark Needham <m.h.need...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Needham  
View profile  
 More options Oct 6 2012, 3:32 am
From: Mark Needham <m.h.need...@gmail.com>
Date: Sat, 6 Oct 2012 00:32:19 -0700 (PDT)
Local: Sat, Oct 6 2012 3:32 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

When you say profile it what do you mean? I have jvisualvm open but all I
was really looking for there was whether I had the heap space large enough
and the JVM wasn't running GC because it couldn't keep the whole thing in
memory. I think I've now got the heap size large enough that it's fine. Is
there something else/some other tool you'd suggest?

Do I achieve the same outcome as  'move rel types into an enum' by putting
the DynamicRelationshipType.withName calls as fields? I tried that and
didn't see any difference.

I tried "Use exclude and continue as your not interested in the nodes
anyway?" and didn't see any difference.

I didn't quite understand what you meant by "Imho your traversal tries to
aggregate on every level not just on the last? Check e.g for path length"

Are you saying that it runs the code in my custom evaluator even when the
last node on the path doesn't have a product hanging of it? Or something
else?

> "Perhaps just return the paths (research area to product to sale) from

the traversal and do the aggregation on the returned paths?"

I haven't tried this yet but will try now.

> "What about introducing a top level node and just running one big
traversal?
> Instead of 34?"

Ditto.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 6 2012, 3:42 am
From: Michael Hunger <michael.hun...@neopersistence.com>
Date: Sat, 6 Oct 2012 09:42:38 +0200
Local: Sat, Oct 6 2012 3:42 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

I mean method time profiling,
http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/profil...

The evaluator is called for every node in the traversal (imho)

But you want to run the code only for the leaf nodes

Sent from mobile device

Am 06.10.2012 um 09:32 schrieb Mark Needham <m.h.need...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Needham  
View profile  
 More options Oct 6 2012, 4:28 am
From: Mark Needham <m.h.need...@gmail.com>
Date: Sat, 6 Oct 2012 01:28:01 -0700 (PDT)
Local: Sat, Oct 6 2012 4:28 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

I tried this suggestion:

"Perhaps just return the paths (research area to product to sale) from the
traversal and do the aggregation on the returned paths?"

This is what code I have:

   DynamicRelationshipType has_child =
DynamicRelationshipType.withName("has_child");

    private Map<String, BigDecimal> calculateTotalSales(List<Node>
subResearchAreas) {
        final Map<String, BigDecimal> totalSales = new HashMap<String,
BigDecimal>();

        org.neo4j.graphdb.traversal.Traverser traverse =
Traversal.description()
                .depthFirst()
                .relationships(has_child, Direction.OUTGOING)
                .relationships(primary_research_area, Direction.OUTGOING)
                .relationships(sold, Direction.OUTGOING)

.evaluator(Evaluators.includeWhereLastRelationshipTypeIs(sold))
                .evaluator(Evaluators.toDepth(6))
                .traverse(subResearchAreas.toArray(new Node[]{}));
        for (Path path : traverse) {
            Node sales = path.endNode();

            BigDecimal monthlySales = new BigDecimal((Double)
sales.getProperty("sales"));
            String name = (String)
path.startNode().getProperty("display_name");
            if (totalSales.containsKey(name)) {
                totalSales.put(name,
totalSales.get(name).add(monthlySales));
            } else {
                totalSales.put(name, monthlySales);
            }
        }

        return totalSales;
    }

Is that what you had in mind?

It takes 23 seconds to execute that vs about 10 seconds for the original
version. I changed the depth of the search to 6 because it now needs to
also traverse the 'primary_research_area' and 'sold' edges.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 6 2012, 5:02 am
From: Michael Hunger <michael.hun...@neopersistence.com>
Date: Sat, 6 Oct 2012 11:02:28 +0200
Local: Sat, Oct 6 2012 5:02 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

Interesting.
Do you have a dataset/generator?

Sent from mobile device

Am 06.10.2012 um 10:28 schrieb Mark Needham <m.h.need...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Needham  
View profile  
 More options Oct 6 2012, 8:33 am
From: Mark Needham <m.h.need...@gmail.com>
Date: Sat, 6 Oct 2012 05:33:18 -0700 (PDT)
Local: Sat, Oct 6 2012 8:33 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

I tried profiling what was going on with yourkit:

There are 11,000 calls to
org.neo4j.kernel.impl.traversal.TraverserIterator.fetchNextOrNull()  and
then 9,000 calls to
org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships() -
I'm not sure exactly what I should be looking for to try and work out
what's causing it to run so slowly though?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Needham  
View profile  
 More options Oct 6 2012, 8:42 am
From: Mark Needham <m.h.need...@gmail.com>
Date: Sat, 6 Oct 2012 05:42:46 -0700 (PDT)
Local: Sat, Oct 6 2012 8:42 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

Attaching a snapshot of the yourkit trace - that might be more useful than
my summary.

  neo-snapshot.html
97K Download

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 6 2012, 9:06 pm
From: Michael Hunger <michael.hun...@neotechnology.com>
Date: Sun, 7 Oct 2012 03:09:27 +0200
Local: Sat, Oct 6 2012 9:09 pm
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

Two more things:

# try to use cache_type strong
db = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(path).setConfig(GraphData baseSettings.cache_type, GraphDatabaseSettings.CacheTypeSetting.strong).newGraphDatabase();
# the bigdecimal addition takes quite a bit of time (perhaps it is possible to store the sales as whole number (money * 100)

The traversal framework seems to add a bit of overhead, esp with multiple given rel-types (I tried a custom-rel-expander but that didn't perform much better (esp. b/c of your variable length path in between)

I tried to connect the top-level research areas to a root node and it worked out quite well, so if you change that part it might help too.

Am 06.10.2012 um 14:42 schrieb Mark Needham <m.h.need...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mattias Persson  
View profile  
 More options Oct 10 2012, 8:01 am
From: Mattias Persson <matt...@neotechnology.com>
Date: Wed, 10 Oct 2012 14:01:43 +0200
Local: Wed, Oct 10 2012 8:01 am
Subject: Re: [Neo4j] Speeding up the performance of a query on the traversal API

I like your second approach better. A couple of things:

1) You could probably have just one evaluator:

    Evaluators.lastRelationshipIs( INCLUDE_AND_PRUNE, EXCLUDE_AND_CONTINUE,
sold );

this way you tell the traversal to not even try to expand the product
nodes. Will probably spare the traversal lots of getRelationships calls.

2) For every returned path, just key on the Node and when you got it all
"convert" the key into the name by doing the getProperty on them.

3) Can a product be sold by many research areas? In that case you'd
probably need uniqueness level NODE_PATH

4) Counts could instead be kept as a property per research levels, just the
number of directly related products so that the traversal can just traverse
through the research areas and accumulate that count.

2012/10/7 Michael Hunger <michael.hun...@neotechnology.com>

--
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »