Cypher: Aggregation functions in WHERE (WHERE count(*)

Elad O

unread,

Jan 2, 2012, 3:30:25 AM1/2/12

to ne...@googlegroups.com

(1.6.M02)

Is it possible?

I'm trying to return all liked items by users who categorized with category 11, and that these items were liked at least two times by those categorized users.

START category=node(11) MATCH category<-[:categorized]-user-[:likes]->item WHERE count(*)>2 RETURN item

I get

SyntaxException: string matching regex ``(``|[^`])*`' expected but `*' found

When I do RETURN p, count(*) I see correct results.

Michael Hunger

unread,

Jan 2, 2012, 4:56:42 PM1/2/12

to ne...@googlegroups.com

What you try is a having clause which does not exist (yet) in cypher.

Can you do by filtering in your application so far?

You might even order by count(*) desc and just stop processing when the count(*) field contains a value < 3.

THanks

Michael

Elad O

unread,

Jan 2, 2012, 7:57:26 PM1/2/12

to ne...@googlegroups.com

Thanks for your answer!

I'd filter in the app side but it can be a lot of data...

Can I achieve something similar with Gremlin? Some kind of hack? Nested Cypher queries?

Elad O

unread,

Jan 2, 2012, 9:31:29 PM1/2/12

to ne...@googlegroups.com

What I could also do is to have an index/property of all users with X items or more and do a simple where. Hope that feature will be implemented soon.

Thanks.

Peter Neubauer

unread,

Jan 3, 2012, 2:30:58 AM1/3/12

to ne...@googlegroups.com

Elad,
Yes, that should work. Let us know how it works out!

On Tuesday, January 3, 2012, Elad O <ela...@gmail.com> wrote:
> What I could also do is to have an index/property of all users with X items or more and do a simple where. Hope that feature will be implemented soon.
>
> Thanks.

--
Sent from a crappy keyboard device with autocorrect enabled.

Pablo Pareja

unread,

Jan 3, 2012, 12:13:52 PM1/3/12

to ne...@googlegroups.com

+1 to adding that feature soon ;)

Pablo

--
Pablo Pareja Tobes

My site http://about.me/pablopareja

LinkedIn http://www.linkedin.com/in/pabloparejatobes

Twitter http://www.twitter.com/pablopareja

Creator of Bio4j --> http://www.bio4j.com

http://www.ohnosequences.com

Marko Rodriguez

unread,

Jan 3, 2012, 12:52:34 PM1/3/12

to ne...@googlegroups.com

Hi,

> Can I achieve something similar with Gremlin?

What is the problem you are trying to solve. In Gremlin, there is the aggregate step and you can conditions pre-aggregation and projections/selections post-aggregateion. If you provide me, perhaps in human language what you are trying to express, I can tell you the respective Gremlin query.

Thanks,
Marko.

http://markorodriguez.com

Elad O

unread,

Jan 3, 2012, 4:58:06 PM1/3/12

to ne...@googlegroups.com

I have users (node), each of which is categorized (relationship) with a category (node) (one or more)

They like (relationship) items (node)

cat<-[:categorized]-u-[:like]->item

1.

I want to have a list of items of users of a certain category, but each user must have at least 3 likes (no matter what items in the system).

I wanted to do WHERE count(*) >= 3 but I can't.

My solution: I added having like_count property on a user node and I used a simple WHERE

START c=node(some_category_id)

MATCH cat<-[:categorized]-u-[:like]->item

WHERE u.like_count >= 3

RETURN distinct u

2.

I want to have a list of items of users with a category, and I want only items that were liked at least 3 times by users with a specific category (so here there's not only like_count for item but it's product-category (from the users)-like_count

I'd like to do something like:

START c=node(some_category_id)

MATCH c<-[:categorized]-u-[:like]->item

WHERE count(*) >= 3

RETURN item, count(*)

ORDER BY count(*) DESC

The WHERE will filter out items that were liked less than 3 times by users of the specific category.

Here it's more complex to have the extra property/index. I could filter it on the app side but there could be many items.

What would be a Gremlin solution?

Thank you so much!

Andres Taylor

unread,

Jan 3, 2012, 5:45:23 PM1/3/12

to ne...@googlegroups.com

On Tue, Jan 3, 2012 at 10:58 PM, Elad O <ela...@gmail.com> wrote:

START c=node(some_category_id)
MATCH c<-[:categorized]-u-[:like]->item
WHERE count(*) >= 3

RETURN item, count(*)
ORDER BY count(*) DESC

Just a comment before Marko shows the way - my plan is to add this to Cypher, as HAVING. Your query would be:

START c=node(some_category_id)

MATCH c<-[:categorized]-u-[:like]->item

RETURN item, count(*)

ORDER BY count(*) DESC

HAVING count(*) >= 3

Andrés

Elad O

unread,

Jan 3, 2012, 6:00:52 PM1/3/12

to ne...@googlegroups.com

HAVING is definitely the way :) Thanks! Looking forward to using that.

Elad O

unread,

Jan 10, 2012, 1:58:13 PM1/10/12

to ne...@googlegroups.com

Solved it with Gremlin, something like:

g.v(11).in('categorized').filter{ it.out('likes').count() > 2 }

Does anyone know how will this perform? Does Neo4j caches the counts?

Marko Rodriguez

unread,

Jan 10, 2012, 2:05:34 PM1/10/12

to ne...@googlegroups.com

Hey,

You can make this faster by not fully iterating. For example:

g.v(11).in('categorized').filter{ it.out('likes').next(3).size() == 3 }

This will only iterate 3 objects. If those 3 objects exist, then you know the count is greater than 2. If not, then its less than 2.

Its an optimization trick,

Marko.

http://markorodriguez.com

Message has been deleted

Elad O

unread,

Jan 10, 2012, 2:49:01 PM1/10/12

to ne...@googlegroups.com

Awesome, thanks.

What happens if I do a sort on that field as well? I want the users that have most likes.

.sort { -it.out('likes').count() }

will iterate on all?

Marko Rodriguez

unread,

Jan 10, 2012, 3:56:42 PM1/10/12

to ne...@googlegroups.com

Hi,

What happens if I do a sort on that field as well? I want the users that have most likes.

.sort { -it.out('likes').count() }

will iterate on all?

Yes, count() is equivalent to this:

c = 0;

while(pipe.hasNext()) {

pipe.next();

c++;

}

return c;

Thus, count() iterates through the entire result set to get a count.

HTH,

Marko.

http://markorodriguez.com

Elad O

unread,

Jan 10, 2012, 6:46:27 PM1/10/12

to ne...@googlegroups.com

Thanks.

Is there a more efficient way for this, except caching the counts on my own as a property of the node?

Marko Rodriguez

unread,

Jan 10, 2012, 6:48:36 PM1/10/12

to ne...@googlegroups.com

Hi,

> Is there a more efficient way for this, except caching the counts on my own as a property of the node?

A more efficient way to do what? Please leave the thread of communication so I can remember what we talked about. Also, provide your query so I can see how to optimize it.

Cypher: Aggregation functions in WHERE (WHERE count(*) > X)

Elad O

Michael Hunger

Elad O

Elad O

Peter Neubauer

Pablo Pareja

Marko Rodriguez

Elad O

Andres Taylor

Elad O

Elad O

Marko Rodriguez

Elad O

Marko Rodriguez

Elad O

Marko Rodriguez