Optimizing ShortestPath query for a better performances

27 views
Skip to first unread message

idor

unread,
Jun 25, 2016, 4:36:03 AM6/25/16
to Neo4j
I have in my graph ~1M nodes and ~1M relations.

My motivation for query is to retrieve all related nodes(by 3 hops) of specific source node + aggregate properties within the response.

I understood the shortestPath taking your sourceNode and query all other nodes(not necessarily related ones) in the graph to match the relevant results.
My latency is around 2-4 seconds when I have load (~4000 concurrent connections). it's too high. 

Any idea how could I optimize my query for better performances?


 MATCH p=shortestPath((user:User{userId:<someUser>})-[r:relation_type1|relation_type2|relation_3*1..3]-(user:User))
        WHERE f <> user 
        RETURN (f.userId) as userId,
        reduce(base = '', rel in r | base + ' ' + rel.dist) as dist 


* notes: userId and relation_type1/2/3 are auto indexed

* I am using the rest client and recently thought to upgrade into the new BOLT driver(not sure it will helps)

Thank you.


Michael Hunger

unread,
Jun 25, 2016, 3:46:37 PM6/25/16
to ne...@googlegroups.com
Hi,

for what you do shortest Path is not the right solution, as you don't wan the path between two nodes but the neighborhood of one node.

Also having that many connections when you don't have the CPUs to process it doesn't make sense, scale it out on a scluster.

Why do you string-concatinate the dist property?

I presume you have an index / constraint on :User(userId) ?

Try this instead:


MATCH (user:User{userId:<someUser>})
 MATCH p=(user)-[r:relation_type1|relation_type2|relation_3*1..3]-(friend:User)
        WHERE friend <> user 
        RETURN friend.userId as userId, reduce(base = '', rel in r | base + ' ' + rel.dist) as dist 

But you also know that this can potentially return a lot of data? e.g. of you have 100 friends on average this returns 100^3 aka 1M results !

Michael


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

idor

unread,
Jun 25, 2016, 4:23:05 PM6/25/16
to Neo4j
Hi Michael,
Thanks for replying. 

scale it out on a scluster.

It requiresNeo4j entreprise license isnt it? 

Why do you string-concatinate the dist property?

On each relation we have a numeric property that we want to aggregate later on. if A connected to D this way (A->B->C>D) we need the properties of each relation between two nodes within 3  hops (A->B,B->C,C->D).

The expected result would be:
B,C,D and the relation between. 


But you also know that this can potentially return a lot of data? e.g. of you have 100 friends on average this returns 100^3 aka 1M results !

The max "friends" of each node will be ~10. not more. 

for what you do shortest Path is not the right solution
 
How this query will overcome ciruclar iterations ? let's assume following relation ships.. 

A->B
B->C->D 
B->F->D
B->H->D
...
now if I apply it on node A. expected results is (B,C,D,F,H). but as u can see there are many paths to get into D. I might have lots of paths between one source to another(within 3 hops) till I iterate on all relevant nodes. 
Wouldnt you query fall for this with performances?

Thanks.

Michael Hunger

unread,
Jun 25, 2016, 5:46:43 PM6/25/16
to ne...@googlegroups.com
On Sat, Jun 25, 2016 at 10:23 PM, idor <idan...@gmail.com> wrote:
Hi Michael,
Thanks for replying. 

scale it out on a scluster.

It requiresNeo4j entreprise license isnt it? 

It will require neo4j enterprise. Which license depends on your use-case and company.
 

Why do you string-concatinate the dist property?

On each relation we have a numeric property that we want to aggregate later on. if A connected to D this way (A->B->C>D) we need the properties of each relation between two nodes within 3  hops (A->B,B->C,C->D).

The expected result would be:
B,C,D and the relation between. 

I don't understand what you're saying. 


But you also know that this can potentially return a lot of data? e.g. of you have 100 friends on average this returns 100^3 aka 1M results !

The max "friends" of each node will be ~10. not more. 

for what you do shortest Path is not the right solution
 
How this query will overcome ciruclar iterations ? let's assume following relation ships.. 

Each relationship will appear only once in a path. So no circles.

If you mean that there are multiple paths between your start node and it's 3rd degree neighbors, sure that can happen.

You, could either aggregate on the end node and select one of the paths.

In APOC procedures there is expandPath operation which allows you to define the uniqueness of nodes and relationships. There you could define node-global as uniqueness.

idor

unread,
Jun 26, 2016, 6:30:01 AM6/26/16
to Neo4j
Hi Michael,
I did experiments with your suggested query. with shortest path I got much lesser DB hits.


my query (with shortest path)

PROFILE MATCH p=shortestPath((user:User{userId:'userId123'})-[r*1..3]-(f:User))
        WHERE f
<> user
        RETURN
*

result:


and using your query (return let's of paths with the same nodes which was redundant)

PROFILE MATCH (user:User{userId:'userId123'})
 MATCH p
=(user)-[r*1..3]-(friend:User)

        WHERE
friend <> user
        RETURN
friend.userId as userId, reduce(base = '', rel in r | base + ' ' + rel.dist) as dist


result:

idor

unread,
Jun 28, 2016, 3:59:22 AM6/28/16
to Neo4j
Hi,
I was wondering if u had a chance to take a look on my results?

Thank you.
Reply all
Reply to author
Forward
0 new messages