Efficient way to groupBy and filter

100 views
Skip to first unread message

jstorm

unread,
Nov 8, 2012, 8:11:52 PM11/8/12
to gremli...@googlegroups.com
I have a few simple vertices with some properties I would like to groupBy on.
 
These vertices also have a property called date which contains a date generated using new Date().
 
What I would like to do is to groupBy on a property called target and if the date of any of the vertices that we are grouping on does not meet a certain condition, we do not to the grouping for them at all and they will not appear in the final grouped results.
 
Looking at the wiki for groupBy, I can use the key and value functions to groupBy and to determine what will get pushed to the final collection outputted by groupBy.
 
Besides iterating through the output of groupBy and removing results where the child vertices do not match the date requirements (i think this could be quite expensive), is there a way to do this filtering during the groupBy process?
 
Cheers :)

jstorm

unread,
Nov 8, 2012, 8:41:29 PM11/8/12
to gremli...@googlegroups.com
P.S. I understand that 1 possible solution is to filter out the vertices that do not match datetime requirements before we do the group by, but let's say I have the following:
 
date: 11 November 9:00 PM
object: v[1]
 
date: 15 November 8:00 PM
object: v[1]
 
Let's say I only want events before 12 November 9:00 PM.
 
The above 2 will be grouped together, but because one of the vertices falls outside the time range, so all vertices that can be grouped due to v[1] should be discarded and not grouped. If I do the filtering before the grouping, then I am unable to implement this.
 
Any input appreciated :)

Stephen Mallette

unread,
Nov 9, 2012, 9:58:44 AM11/9/12
to gremli...@googlegroups.com
You can filter in the reduce closure pretty easily so that you don't
have to post-process the outputted map from the groupBy:

gremlin> g.V.out.groupBy{it.name}{it.in}{it.unique().findAll{i ->
i.age > 30}.name}.cap
==>{lop=[josh, peter], ripple=[josh], josh=[], vadas=[]}

In this way you can evaluate each item extracted into the grouped
value of the map. I guess you could do some filtering in the value
closure as well:

gremlin> g.V.out.groupBy{it.name}{it.in.filter{i -> i.age >
30}.name}{it.toList().unique()}.cap
==>{lop=[josh, peter], ripple=[josh], josh=[], vadas=[]}

I think you should spend some time understanding the underpinnings of
groovy/closures a bit and how it relates back to the gremlin. It
sounds like you're attempting to do some less than trivial things in
your work. Consider taking some time away from these very complex
problems and focus on just getting things to work. Once working, then
you can refine and improve. Just a suggestion :)

Stephen
> --
>
>

jstorm

unread,
Nov 10, 2012, 8:03:43 PM11/10/12
to gremli...@googlegroups.com
Thanks for your help Stephen :)
 
Point taken. I am actually learning groovy and gremlin (but have no experience in Java at all) and trying to apply it to problems I am trying to solve, but perhaps I have bitten off a bit more than I can chew at the moment.
 
I am currently going through Groovy in Action (excellent book by the way, for those who want to get started with groovy but have not Java experience). Previously I was reading Programming Groovy (pragmatic programmers), but I felt that there is too much emphasis on knowing Java (Groovy in Action is much more neutral)).
 
I do wish that there is a book for gremlin though (perhaps one of you guys could write one :) ). Currently I am relying on the wiki on Github and gremlindocs.com, but for a beginner, there isn't really a way to ease myself into gremlin so to speak.
 
Cheers :)
Reply all
Reply to author
Forward
0 new messages