Poor performance Cypher traversal

354 views
Skip to first unread message

Sean Timm

unread,
Apr 24, 2012, 5:11:41 PM4/24/12
to ne...@googlegroups.com
neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531")
> MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, user-[r?:VIEW]->friend_viewed
> WHERE r IS NULL AND v.date > "2012-04-10"
> return v.date,friend_viewed

This query takes about 40 minutes!  The equivalent query in MySQL (though unfairly faster hardware) takes < 7 seconds.  I expected Neo4j to be at worst on par with MySQL for this query, but actually be faster.

The user referenced above has a large number of friends: 3000.   Assuming each of those friends had 1000 views (I think it is much less than that), and a node traversal takes 1 ms, that is 3 seconds.  I feel like I must be doing something horribly wrong.

From JMX, NodeCache hit rate is very good (close to 100%), RelationshipCache is ~98%.

I have a 4GB heap on an 8GB 4 core Linux machine.  5 disks RAID 0.

nodes: ~3MM
PropertyIds: ~13MM
Relationships: ~38MM
RelationshipTypes: 4

25M     neostore.nodestore.db
520M    neostore.propertystore.db
128     neostore.propertystore.db.arrays
1.1K    neostore.propertystore.db.index
1.1K    neostore.propertystore.db.index.keys
144M    neostore.propertystore.db.strings
1.2G    neostore.relationshipstore.db

# Default values for the low-level graph engine
neostore.nodestore.db.mapped_memory=26M
neostore.relationshipstore.db.mapped_memory=1300M
neostore.propertystore.db.mapped_memory=130M
neostore.propertystore.db.strings.mapped_memory=150M
neostore.propertystore.db.arrays.mapped_memory=0M

Michael Hunger

unread,
Apr 24, 2012, 5:24:06 PM4/24/12
to ne...@googlegroups.com
Sean,

thanks for getting back to us with that, real world use-cases are very helpful to improve the product.

Please note that cypher is still under heavy development, with little time spent so far on performance optimization.

It would be great if you could share your dataset (offline) with me to allow some analysis (or a generator that can generate your dataset).

If you're returning v.date and friend_viewed, why is it optional in the first place?

You might try the following.

> CYPHER 1.7 START user=node:node(userId = "378531")

> > MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed
> > WHERE v.date > "2012-04-10"
AND not (user-[:VIEW]->friend_viewed)
>
> > return v.date,friend_viewed

Cheers

Michael

Sean Timm

unread,
Apr 24, 2012, 10:54:42 PM4/24/12
to ne...@googlegroups.com
Thanks for noticing that.  v is not optional.  With your improvements and the cache warm, the result is "9290 rows, 568812 ms".  Still not speedy, but better.  Would your recommendation be to try the native API at this point?

Michael Hunger

unread,
Apr 25, 2012, 1:53:24 AM4/25/12
to ne...@googlegroups.com
Was this for the first run? If not please try multiple runs.

It should be much faster now. Of course you always have the means of using the core api or gremlin.

Still I would love to profile your case.

Michael

Sent from mobile device

Peter Neubauer

unread,
Apr 25, 2012, 2:57:05 AM4/25/12
to ne...@googlegroups.com
Sean,
look at this query, you are including friend_viewed that where r=NULL
and v=NULL, only one is existing or both are existing. In essence you
are forcing Cypher to examine ALL friend_viewed in the whole dataset.
How many are these? I think this might be a full graph scan you are
running into, so I think some index would be a better option here,
maybe on the date, and using that also as a starting point?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Peter Neubauer

unread,
Apr 25, 2012, 5:10:08 AM4/25/12
to ne...@googlegroups.com
Sorry,
disregard this, I missed Michaels answer on this.

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Sean Timm

unread,
Apr 25, 2012, 9:53:18 AM4/25/12
to ne...@googlegroups.com
Restarted server.

neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed WHERE v.date > "2012-04-10" AND not (user-[:VIEW]->friend_viewed) RETURN v.date,friend_viewed LIMIT 20

Run   Time
1       42747 ms
2        7315 ms
3        5565 ms
4        5527 ms
5        5537 ms
6        5377 ms

Removed limit after the 6 runs: 9290 rows, 5525 ms

Much better.  Thanks!

Thanks,
Sean

On Wednesday, April 25, 2012 1:53:24 AM UTC-4, Michael Hunger wrote:
Was this for the first run? If not please try multiple runs.

It should be much faster now. Of course you always have the means of using the core api or gremlin.

Still I would love to profile your case.

Michael

Sent from mobile device

Peter Neubauer

unread,
Apr 25, 2012, 9:59:35 AM4/25/12
to ne...@googlegroups.com
Nice!
Still it sounds like a long time. Have you tried converting the dates
to longs, instead of storing strings in the DB? That leads to a lot of
string comparisons which are much more expensive than simple longs.

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Reply all
Reply to author
Forward
0 new messages