> MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed,
user-[r?:VIEW]->friend_viewed
> WHERE r IS NULL AND v.date > "2012-04-10" > return v.date,friend_viewed
This query takes about 40 minutes! The equivalent query in MySQL (though unfairly faster hardware) takes < 7 seconds. I expected Neo4j to be at worst on par with MySQL for this query, but actually be faster.
The user referenced above has a large number of friends: 3000. Assuming each of those friends had 1000 views (I think it is much less than that), and a node traversal takes 1 ms, that is 3 seconds. I feel like I must be doing something horribly wrong.
From JMX, NodeCache hit rate is very good (close to 100%), RelationshipCache is ~98%.
I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
> neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") > > MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, user-[r?:VIEW]->friend_viewed > > WHERE r IS NULL AND v.date > "2012-04-10" > > return v.date,friend_viewed
> This query takes about 40 minutes! The equivalent query in MySQL (though unfairly faster hardware) takes < 7 seconds. I expected Neo4j to be at worst on par with MySQL for this query, but actually be faster.
> The user referenced above has a large number of friends: 3000. Assuming each of those friends had 1000 views (I think it is much less than that), and a node traversal takes 1 ms, that is 3 seconds. I feel like I must be doing something horribly wrong.
> From JMX, NodeCache hit rate is very good (close to 100%), RelationshipCache is ~98%.
> I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
Thanks for noticing that. v is not optional. With your improvements and the cache warm, the result is "9290 rows, 568812 ms". Still not speedy, but better. Would your recommendation be to try the native API at this point?
On Tuesday, April 24, 2012 5:24:06 PM UTC-4, Michael Hunger wrote:
> Sean,
> thanks for getting back to us with that, real world use-cases are very > helpful to improve the product.
> Please note that cypher is still under heavy development, with little time > spent so far on performance optimization.
> It would be great if you could share your dataset (offline) with me to > allow some analysis (or a generator that can generate your dataset).
> If you're returning v.date and friend_viewed, why is it optional in the > first place?
> You might try the following.
> > CYPHER 1.7 START user=node:node(userId = "378531") > > > MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed > > > WHERE v.date > "2012-04-10" > AND not (user-[:VIEW]->friend_viewed)
> > > return v.date,friend_viewed
> Cheers
> Michael
> Am 24.04.2012 um 23:11 schrieb Sean Timm:
> > neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") > > > MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, > user-[r?:VIEW]->friend_viewed > > > WHERE r IS NULL AND v.date > "2012-04-10" > > > return v.date,friend_viewed
> > This query takes about 40 minutes! The equivalent query in MySQL > (though unfairly faster hardware) takes < 7 seconds. I expected Neo4j to > be at worst on par with MySQL for this query, but actually be faster.
> > The user referenced above has a large number of friends: 3000. > Assuming each of those friends had 1000 views (I think it is much less than > that), and a node traversal takes 1 ms, that is 3 seconds. I feel like I > must be doing something horribly wrong.
> > From JMX, NodeCache hit rate is very good (close to 100%), > RelationshipCache is ~98%.
> > I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
> Thanks for noticing that. v is not optional. With your improvements and the cache warm, the result is "9290 rows, 568812 ms". Still not speedy, but better. Would your recommendation be to try the native API at this point?
> On Tuesday, April 24, 2012 5:24:06 PM UTC-4, Michael Hunger wrote: > Sean, > thanks for getting back to us with that, real world use-cases are very helpful to improve the product.
> Please note that cypher is still under heavy development, with little time spent so far on performance optimization.
> It would be great if you could share your dataset (offline) with me to allow some analysis (or a generator that can generate your dataset).
> If you're returning v.date and friend_viewed, why is it optional in the first place?
> You might try the following.
> > CYPHER 1.7 START user=node:node(userId = "378531") > > > MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed > > > WHERE v.date > "2012-04-10" > AND not (user-[:VIEW]->friend_viewed)
> > > return v.date,friend_viewed
> Cheers
> Michael
> Am 24.04.2012 um 23:11 schrieb Sean Timm:
> > neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") > > > MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, user-[r?:VIEW]->friend_viewed > > > WHERE r IS NULL AND v.date > "2012-04-10" > > > return v.date,friend_viewed
> > This query takes about 40 minutes! The equivalent query in MySQL (though unfairly faster hardware) takes < 7 seconds. I expected Neo4j to be at worst on par with MySQL for this query, but actually be faster.
> > The user referenced above has a large number of friends: 3000. Assuming each of those friends had 1000 views (I think it is much less than that), and a node traversal takes 1 ms, that is 3 seconds. I feel like I must be doing something horribly wrong.
> > From JMX, NodeCache hit rate is very good (close to 100%), RelationshipCache is ~98%.
> > I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
Sean,
look at this query, you are including friend_viewed that where r=NULL
and v=NULL, only one is existing or both are existing. In essence you
are forcing Cypher to examine ALL friend_viewed in the whole dataset.
How many are these? I think this might be a full graph scan you are
running into, so I think some index would be a better option here,
maybe on the date, and using that also as a starting point?
On Tue, Apr 24, 2012 at 11:11 PM, Sean Timm <sean.t...@teamaol.com> wrote:
> neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531")
>> MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed,
>> user-[r?:VIEW]->friend_viewed
>> WHERE r IS NULL AND v.date > "2012-04-10"
>> return v.date,friend_viewed
> This query takes about 40 minutes! The equivalent query in MySQL (though
> unfairly faster hardware) takes < 7 seconds. I expected Neo4j to be at
> worst on par with MySQL for this query, but actually be faster.
> The user referenced above has a large number of friends: 3000. Assuming
> each of those friends had 1000 views (I think it is much less than that),
> and a node traversal takes 1 ms, that is 3 seconds. I feel like I must be
> doing something horribly wrong.
> From JMX, NodeCache hit rate is very good (close to 100%), RelationshipCache
> is ~98%.
> I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
<peter.neuba...@neotechnology.com> wrote: > Sean, > look at this query, you are including friend_viewed that where r=NULL > and v=NULL, only one is existing or both are existing. In essence you > are forcing Cypher to examine ALL friend_viewed in the whole dataset. > How many are these? I think this might be a full graph scan you are > running into, so I think some index would be a better option here, > maybe on the date, and using that also as a starting point?
> If you can write, you can code - @coderdojomalmo > If you can sketch, you can use a graph database - @neo4j
> On Tue, Apr 24, 2012 at 11:11 PM, Sean Timm <sean.t...@teamaol.com> wrote: >> neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") >>> MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, >>> user-[r?:VIEW]->friend_viewed >>> WHERE r IS NULL AND v.date > "2012-04-10" >>> return v.date,friend_viewed
>> This query takes about 40 minutes! The equivalent query in MySQL (though >> unfairly faster hardware) takes < 7 seconds. I expected Neo4j to be at >> worst on par with MySQL for this query, but actually be faster.
>> The user referenced above has a large number of friends: 3000. Assuming >> each of those friends had 1000 views (I think it is much less than that), >> and a node traversal takes 1 ms, that is 3 seconds. I feel like I must be >> doing something horribly wrong.
>> From JMX, NodeCache hit rate is very good (close to 100%), RelationshipCache >> is ~98%.
>> I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
On Wednesday, April 25, 2012 1:53:24 AM UTC-4, Michael Hunger wrote:
> Was this for the first run? If not please try multiple runs.
> It should be much faster now. Of course you always have the means of using > the core api or gremlin.
> Still I would love to profile your case.
> Michael
> Sent from mobile device
> Am 25.04.2012 um 04:54 schrieb Sean Timm:
> Thanks for noticing that. v is not optional. With your improvements and > the cache warm, the result is "9290 rows, 568812 ms". Still not speedy, > but better. Would your recommendation be to try the native API at this > point?
> On Tuesday, April 24, 2012 5:24:06 PM UTC-4, Michael Hunger wrote:
>> Sean,
>> thanks for getting back to us with that, real world use-cases are very >> helpful to improve the product.
>> Please note that cypher is still under heavy development, with little >> time spent so far on performance optimization.
>> It would be great if you could share your dataset (offline) with me to >> allow some analysis (or a generator that can generate your dataset).
>> If you're returning v.date and friend_viewed, why is it optional in the >> first place?
>> You might try the following.
>> > CYPHER 1.7 START user=node:node(userId = "378531") >> > > MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed >> > > WHERE v.date > "2012-04-10" >> AND not (user-[:VIEW]->friend_viewed)
>> > > return v.date,friend_viewed
>> Cheers
>> Michael
>> Am 24.04.2012 um 23:11 schrieb Sean Timm:
>> > neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") >> > > MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, >> user-[r?:VIEW]->friend_viewed >> > > WHERE r IS NULL AND v.date > "2012-04-10" >> > > return v.date,friend_viewed
>> > This query takes about 40 minutes! The equivalent query in MySQL >> (though unfairly faster hardware) takes < 7 seconds. I expected Neo4j to >> be at worst on par with MySQL for this query, but actually be faster.
>> > The user referenced above has a large number of friends: 3000. >> Assuming each of those friends had 1000 views (I think it is much less than >> that), and a node traversal takes 1 ms, that is 3 seconds. I feel like I >> must be doing something horribly wrong.
>> > From JMX, NodeCache hit rate is very good (close to 100%), >> RelationshipCache is ~98%.
>> > I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.
Nice! Still it sounds like a long time. Have you tried converting the dates to longs, instead of storing strings in the DB? That leads to a lot of string comparisons which are much more expensive than simple longs.
On Wed, Apr 25, 2012 at 3:53 PM, Sean Timm <sean.t...@teamaol.com> wrote: > Restarted server.
> neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") MATCH > user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed WHERE v.date > > "2012-04-10" AND not (user-[:VIEW]->friend_viewed) RETURN > v.date,friend_viewed LIMIT 20
> Run Time > 1 42747 ms > 2 7315 ms > 3 5565 ms > 4 5527 ms > 5 5537 ms > 6 5377 ms
> Removed limit after the 6 runs: 9290 rows, 5525 ms
> Much better. Thanks!
> Thanks, > Sean
> On Wednesday, April 25, 2012 1:53:24 AM UTC-4, Michael Hunger wrote:
>> Was this for the first run? If not please try multiple runs.
>> It should be much faster now. Of course you always have the means of using >> the core api or gremlin.
>> Still I would love to profile your case.
>> Michael
>> Sent from mobile device
>> Am 25.04.2012 um 04:54 schrieb Sean Timm:
>> Thanks for noticing that. v is not optional. With your improvements and >> the cache warm, the result is "9290 rows, 568812 ms". Still not speedy, but >> better. Would your recommendation be to try the native API at this point?
>> On Tuesday, April 24, 2012 5:24:06 PM UTC-4, Michael Hunger wrote:
>>> Sean,
>>> thanks for getting back to us with that, real world use-cases are very >>> helpful to improve the product.
>>> Please note that cypher is still under heavy development, with little >>> time spent so far on performance optimization.
>>> It would be great if you could share your dataset (offline) with me to >>> allow some analysis (or a generator that can generate your dataset).
>>> If you're returning v.date and friend_viewed, why is it optional in the >>> first place?
>>> You might try the following.
>>> > CYPHER 1.7 START user=node:node(userId = "378531") >>> > > MATCH user-[:FOLLOW]->friend-[v:VIEW]->friend_viewed >>> > > WHERE v.date > "2012-04-10" >>> AND not (user-[:VIEW]->friend_viewed)
>>> > > return v.date,friend_viewed
>>> Cheers
>>> Michael
>>> Am 24.04.2012 um 23:11 schrieb Sean Timm:
>>> > neo4j-sh (0)$ CYPHER 1.7 START user=node:node(userId = "378531") >>> > > MATCH user-[:FOLLOW]->friend-[v?:VIEW]->friend_viewed, >>> > > user-[r?:VIEW]->friend_viewed >>> > > WHERE r IS NULL AND v.date > "2012-04-10" >>> > > return v.date,friend_viewed
>>> > This query takes about 40 minutes! The equivalent query in MySQL >>> > (though unfairly faster hardware) takes < 7 seconds. I expected Neo4j to be >>> > at worst on par with MySQL for this query, but actually be faster.
>>> > The user referenced above has a large number of friends: 3000. >>> > Assuming each of those friends had 1000 views (I think it is much less than >>> > that), and a node traversal takes 1 ms, that is 3 seconds. I feel like I >>> > must be doing something horribly wrong.
>>> > From JMX, NodeCache hit rate is very good (close to 100%), >>> > RelationshipCache is ~98%.
>>> > I have a 4GB heap on an 8GB 4 core Linux machine. 5 disks RAID 0.