Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Poor performance Cypher traversal
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Sean Timm  
View profile  
 More options Apr 24 2012, 5:11 pm
From: Sean Timm <sean.t...@teamaol.com>
Date: Tue, 24 Apr 2012 14:11:41 -0700 (PDT)
Local: Tues, Apr 24 2012 5:11 pm
Subject: Poor performance Cypher traversal

neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531")

> MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed,

user-[r?:VIEW]->friend_viewed

> WHERE r IS NULL AND v.date > "2012-04-10"
> return v.date,friend_viewed

This query takes about 40 minutes!  The equivalent query in MySQL (though
unfairly faster hardware) takes < 7 seconds.  I expected Neo4j to be at
worst on par with MySQL for this query, but actually be faster.

The user referenced above has a large number of friends: 3000.   Assuming
each of those friends had 1000 views (I think it is much less than that),
and a node traversal takes 1 ms, that is 3 seconds.  I feel like I must be
doing something horribly wrong.

From JMX, NodeCache hit rate is very good (close to 100%),
RelationshipCache is ~98%.

I have a 4GB heap on an 8GB 4 core Linux machine.  5 disks RAID 0.

nodes: ~3MM
PropertyIds: ~13MM
Relationships: ~38MM
RelationshipTypes: 4

25M     neostore.nodestore.db
520M    neostore.propertystore.db
128     neostore.propertystore.db.arrays
1.1K    neostore.propertystore.db.index
1.1K    neostore.propertystore.db.index.keys
144M    neostore.propertystore.db.strings
1.2G    neostore.relationshipstore.db

# Default values for the low-level graph engine
neostore.nodestore.db.mapped_memory=26M
neostore.relationshipstore.db.mapped_memory=1300M
neostore.propertystore.db.mapped_memory=130M
neostore.propertystore.db.strings.mapped_memory=150M
neostore.propertystore.db.arrays.mapped_memory=0M


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Apr 24 2012, 5:24 pm
From: Michael Hunger <michael.hun...@neotechnology.com>
Date: Tue, 24 Apr 2012 23:24:06 +0200
Local: Tues, Apr 24 2012 5:24 pm
Subject: Re: [Neo4j] Poor performance Cypher traversal
Sean,

thanks for getting back to us with that, real world use-cases are very helpful to improve the product.

Please note that cypher is still under heavy development, with little time spent so far on performance optimization.

It would be great if you could share your dataset (offline) with me to allow some analysis (or a generator that can generate your dataset).

If you're returning v.date and friend_viewed, why is it optional in the first place?

You might try the following.

> CYPHER 1.7 START user=node:node(userId = "378531")
> > MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed
> > WHERE v.date > "2012-04-10"

AND not (user-[:VIEW]->friend_viewed)

> > return v.date,friend_viewed

Cheers

Michael

Am 24.04.2012 um 23:11 schrieb Sean Timm:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sean Timm  
View profile  
 More options Apr 24 2012, 10:54 pm
From: Sean Timm <sean.t...@teamaol.com>
Date: Tue, 24 Apr 2012 19:54:42 -0700 (PDT)
Local: Tues, Apr 24 2012 10:54 pm
Subject: Re: [Neo4j] Poor performance Cypher traversal

Thanks for noticing that.  v is not optional.  With your improvements and
the cache warm, the result is "9290 rows, 568812 ms".  Still not speedy,
but better.  Would your recommendation be to try the native API at this
point?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Apr 25 2012, 1:53 am
From: Michael Hunger <michael.hun...@neopersistence.com>
Date: Wed, 25 Apr 2012 07:53:24 +0200
Local: Wed, Apr 25 2012 1:53 am
Subject: Re: [Neo4j] Poor performance Cypher traversal

Was this for the first run? If not please try multiple runs.

It should be much faster now. Of course you always have the means of using the core api or gremlin.

Still I would love to profile your case.

Michael

Sent from mobile device

Am 25.04.2012 um 04:54 schrieb Sean Timm <sean.t...@teamaol.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Apr 25 2012, 2:57 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Wed, 25 Apr 2012 08:57:05 +0200
Local: Wed, Apr 25 2012 2:57 am
Subject: Re: [Neo4j] Poor performance Cypher traversal
Sean,
look at this query, you are including friend_viewed that where r=NULL
and v=NULL, only one is existing or both are existing. In essence you
are forcing Cypher to examine ALL friend_viewed in the whole dataset.
How many are these? I think this might be a full graph scan you are
running into, so I think some index would be a better option here,
maybe on the date, and using that also as a starting point?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Apr 25 2012, 5:10 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Wed, 25 Apr 2012 11:10:08 +0200
Local: Wed, Apr 25 2012 5:10 am
Subject: Re: [Neo4j] Poor performance Cypher traversal
Sorry,
disregard this, I missed Michaels answer on this.

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

On Wed, Apr 25, 2012 at 8:57 AM, Peter Neubauer


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sean Timm  
View profile  
 More options Apr 25 2012, 9:53 am
From: Sean Timm <sean.t...@teamaol.com>
Date: Wed, 25 Apr 2012 06:53:18 -0700 (PDT)
Local: Wed, Apr 25 2012 9:53 am
Subject: Re: [Neo4j] Poor performance Cypher traversal

Restarted server.

neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") MATCH
user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed WHERE v.date >
"2012-04-10" AND not (user-[:VIEW]->friend_viewed) RETURN
v.date,friend_viewed LIMIT 20

Run   Time
1       42747 ms
2        7315 ms
3        5565 ms
4        5527 ms
5        5537 ms
6        5377 ms

Removed limit after the 6 runs: 9290 rows, 5525 ms

Much better.  Thanks!

Thanks,
Sean


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Apr 25 2012, 9:59 am
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Wed, 25 Apr 2012 15:59:35 +0200
Local: Wed, Apr 25 2012 9:59 am
Subject: Re: [Neo4j] Poor performance Cypher traversal
Nice!
Still it sounds like a long time. Have you tried converting the dates
to longs, instead of storing strings in the DB? That leads to a lot of
string comparisons which are much more expensive than simple longs.

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »