Cypher: Aggregation functions in WHERE (WHERE count(*) > X)

3,575 views
Skip to first unread message

Elad O

unread,
Jan 2, 2012, 3:30:25 AM1/2/12
to ne...@googlegroups.com
(1.6.M02)

Is it possible?

I'm trying to return all liked items by users who categorized with category 11, and that these items were liked at least two times by those categorized users.

START category=node(11) MATCH category<-[:categorized]-user-[:likes]->item WHERE count(*)>2 RETURN item

I get 

SyntaxException: string matching regex ``(``|[^`])*`' expected but `*' found

When I do RETURN p, count(*) I see correct results.

Michael Hunger

unread,
Jan 2, 2012, 4:56:42 PM1/2/12
to ne...@googlegroups.com
What you try is a having clause which does not exist (yet) in cypher.

Can you do by filtering in your application so far?

You might even order by count(*) desc and just stop processing when the count(*) field contains a value < 3.

THanks

Michael

Elad O

unread,
Jan 2, 2012, 7:57:26 PM1/2/12
to ne...@googlegroups.com
Thanks for your answer!

I'd filter in the app side but it can be a lot of data...
Can I achieve something similar with Gremlin? Some kind of hack? Nested Cypher queries?

Elad O

unread,
Jan 2, 2012, 9:31:29 PM1/2/12
to ne...@googlegroups.com
What I could also do is to have an index/property of all users with X items or more and do a simple where. Hope that feature will be implemented soon.

Thanks.

Peter Neubauer

unread,
Jan 3, 2012, 2:30:58 AM1/3/12
to ne...@googlegroups.com
Elad,
Yes, that should work. Let us know how it works out!


On Tuesday, January 3, 2012, Elad O <ela...@gmail.com> wrote:
> What I could also do is to have an index/property of all users with X items or more and do a simple where. Hope that feature will be implemented soon.
>
> Thanks.

--
Sent from a crappy keyboard device with autocorrect enabled.

Pablo Pareja

unread,
Jan 3, 2012, 12:13:52 PM1/3/12
to ne...@googlegroups.com
+1 to adding that feature soon ;)

Pablo

Marko Rodriguez

unread,
Jan 3, 2012, 12:52:34 PM1/3/12
to ne...@googlegroups.com
Hi,

> Can I achieve something similar with Gremlin?

What is the problem you are trying to solve. In Gremlin, there is the aggregate step and you can conditions pre-aggregation and projections/selections post-aggregateion. If you provide me, perhaps in human language what you are trying to express, I can tell you the respective Gremlin query.

Thanks,
Marko.

http://markorodriguez.com

Elad O

unread,
Jan 3, 2012, 4:58:06 PM1/3/12
to ne...@googlegroups.com
I have users (node), each of which is categorized (relationship) with a category (node) (one or more)
They like (relationship) items (node)

cat<-[:categorized]-u-[:like]->item

1.

I want to have a list of items of users of a certain category, but each user must have at least 3 likes (no matter what items in the system).
I wanted to do WHERE count(*) >= 3 but I can't.
My solution: I added having like_count property on a user node and I used a simple WHERE

START c=node(some_category_id)
MATCH cat<-[:categorized]-u-[:like]->item
WHERE u.like_count >= 3
RETURN distinct u


2.

I want to have a list of items of users with a category, and I want only items that were liked at least 3 times by users with a specific category (so here there's not only like_count for item but it's product-category (from the users)-like_count

I'd like to do something like:

START c=node(some_category_id)
MATCH c<-[:categorized]-u-[:like]->item
WHERE count(*) >= 3
RETURN item, count(*)
ORDER BY count(*) DESC

The WHERE will filter out items that were liked less than 3 times by users of the specific category.

Here it's more complex to have the extra property/index. I could filter it on the app side but there could be many items.

What would be a Gremlin solution?

Thank you so much!

Andres Taylor

unread,
Jan 3, 2012, 5:45:23 PM1/3/12
to ne...@googlegroups.com
On Tue, Jan 3, 2012 at 10:58 PM, Elad O <ela...@gmail.com> wrote:
START c=node(some_category_id)
MATCH c<-[:categorized]-u-[:like]->item
WHERE count(*) >= 3
RETURN item, count(*)
ORDER BY count(*) DESC

Just a comment before Marko shows the way - my plan is to add this to Cypher, as HAVING. Your query would be: 
START c=node(some_category_id)
MATCH c<-[:categorized]-u-[:like]->item
RETURN item, count(*)
ORDER BY count(*) DESC
HAVING count(*) >= 3

Andrés

Elad O

unread,
Jan 3, 2012, 6:00:52 PM1/3/12
to ne...@googlegroups.com
HAVING is definitely the way :) Thanks! Looking forward to using that.

Elad O

unread,
Jan 10, 2012, 1:58:13 PM1/10/12
to ne...@googlegroups.com
Solved it with Gremlin, something like:

g.v(11).in('categorized').filter{ it.out('likes').count() > 2 }

Does anyone know how will this perform? Does Neo4j caches the counts?

Marko Rodriguez

unread,
Jan 10, 2012, 2:05:34 PM1/10/12
to ne...@googlegroups.com
Hey,

You can make this faster by not fully iterating. For example:

g.v(11).in('categorized').filter{ it.out('likes').next(3).size() == 3 }

This will only iterate 3 objects. If those 3 objects exist, then you know the count is greater than 2. If not, then its less than 2.

Its an optimization trick,
Marko.
Message has been deleted

Elad O

unread,
Jan 10, 2012, 2:49:01 PM1/10/12
to ne...@googlegroups.com
Awesome, thanks.

What happens if I do a sort on that field as well? I want the users that have most likes.

.sort { -it.out('likes').count() }

will iterate on all?

Marko Rodriguez

unread,
Jan 10, 2012, 3:56:42 PM1/10/12
to ne...@googlegroups.com
Hi,

What happens if I do a sort on that field as well? I want the users that have most likes.

.sort { -it.out('likes').count() }

will iterate on all?

Yes, count() is equivalent to this:

c = 0;
while(pipe.hasNext()) {
pipe.next();
c++;
}
return c;

Thus, count() iterates through the entire result set to get a count.

HTH,
Marko.


Elad O

unread,
Jan 10, 2012, 6:46:27 PM1/10/12
to ne...@googlegroups.com
Thanks.
Is there a more efficient way for this, except caching the counts on my own as a property of the node?

Marko Rodriguez

unread,
Jan 10, 2012, 6:48:36 PM1/10/12
to ne...@googlegroups.com
Hi,

> Is there a more efficient way for this, except caching the counts on my own as a property of the node?

A more efficient way to do what? Please leave the thread of communication so I can remember what we talked about. Also, provide your query so I can see how to optimize it.

Thanks,
Marko.

http://markorodriguez.com

Reply all
Reply to author
Forward
0 new messages