Message from discussion
Efficient way to groupBy and filter
Date: Sat, 10 Nov 2012 17:03:43 -0800 (PST)
From: jstorm <infected...@gmail.com>
To: gremlin-users@googlegroups.com
Message-Id: <4e2bb987-d7b6-4c2f-8e03-34c5049e2ff0@googlegroups.com>
In-Reply-To: <CAA-H4394hFUjq1LC42bdLSyLzmE2ecdLud=KCd2Jo53-++ukgg@mail.gmail.com>
References: <85437ca9-8a02-439d-bb53-120d8bd93192@googlegroups.com>
<19dee1d1-fbd9-494a-84bc-d989b8ac2830@googlegroups.com>
<CAA-H4394hFUjq1LC42bdLSyLzmE2ecdLud=KCd2Jo53-++ukgg@mail.gmail.com>
Subject: Re: [TinkerPop] Re: Efficient way to groupBy and filter
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_7_28791508.1352595823402"
------=_Part_7_28791508.1352595823402
Content-Type: multipart/alternative;
boundary="----=_Part_8_10412622.1352595823403"
------=_Part_8_10412622.1352595823403
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Thanks for your help Stephen :)
Point taken. I am actually learning groovy and gremlin (but have no
experience in Java at all) and trying to apply it to problems I am trying
to solve, but perhaps I have bitten off a bit more than I can chew at the
moment.
I am currently going through Groovy in Action (excellent book by the way,
for those who want to get started with groovy but have not Java
experience). Previously I was reading Programming Groovy (pragmatic
programmers), but I felt that there is too much emphasis on knowing Java
(Groovy in Action is much more neutral)).
I do wish that there is a book for gremlin though (perhaps one of you guys
could write one :) ). Currently I am relying on the wiki on Github and
gremlindocs.com, but for a beginner, there isn't really a way to ease
myself into gremlin so to speak.
Cheers :)
On Saturday, November 10, 2012 1:58:48 AM UTC+11, Stephen Mallette wrote:
> You can filter in the reduce closure pretty easily so that you don't
> have to post-process the outputted map from the groupBy:
>
> gremlin> g.V.out.groupBy{it.name}{it.in}{it.unique().findAll{i ->
> i.age > 30}.name}.cap
> ==>{lop=[josh, peter], ripple=[josh], josh=[], vadas=[]}
>
> In this way you can evaluate each item extracted into the grouped
> value of the map. I guess you could do some filtering in the value
> closure as well:
>
> gremlin> g.V.out.groupBy{it.name}{it.in.filter{i -> i.age >
> 30}.name}{it.toList().unique()}.cap
> ==>{lop=[josh, peter], ripple=[josh], josh=[], vadas=[]}
>
> I think you should spend some time understanding the underpinnings of
> groovy/closures a bit and how it relates back to the gremlin. It
> sounds like you're attempting to do some less than trivial things in
> your work. Consider taking some time away from these very complex
> problems and focus on just getting things to work. Once working, then
> you can refine and improve. Just a suggestion :)
>
> Stephen
>
> On Thu, Nov 8, 2012 at 8:41 PM, jstorm <infec...@gmail.com <javascript:>>
> wrote:
> > P.S. I understand that 1 possible solution is to filter out the vertices
> > that do not match datetime requirements before we do the group by, but
> let's
> > say I have the following:
> >
> > date: 11 November 9:00 PM
> > object: v[1]
> >
> > date: 15 November 8:00 PM
> > object: v[1]
> >
> > Let's say I only want events before 12 November 9:00 PM.
> >
> > The above 2 will be grouped together, but because one of the vertices
> falls
> > outside the time range, so all vertices that can be grouped due to v[1]
> > should be discarded and not grouped. If I do the filtering before the
> > grouping, then I am unable to implement this.
> >
> > Any input appreciated :)
> >
> >
> > On Friday, November 9, 2012 12:11:53 PM UTC+11, jstorm wrote:
> >>
> >> I have a few simple vertices with some properties I would like to
> groupBy
> >> on.
> >>
> >> These vertices also have a property called date which contains a date
> >> generated using new Date().
> >>
> >> What I would like to do is to groupBy on a property called target and
> if
> >> the date of any of the vertices that we are grouping on does not meet a
> >> certain condition, we do not to the grouping for them at all and they
> will
> >> not appear in the final grouped results.
> >>
> >> Looking at the wiki for groupBy, I can use the key and value functions
> to
> >> groupBy and to determine what will get pushed to the final collection
> >> outputted by groupBy.
> >>
> >> Besides iterating through the output of groupBy and removing results
> where
> >> the child vertices do not match the date requirements (i think this
> could be
> >> quite expensive), is there a way to do this filtering during the
> groupBy
> >> process?
> >>
> >> Cheers :)
> >
> > --
> >
> >
>
------=_Part_8_10412622.1352595823403
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<div>Thanks for your help Stephen :)</div><div> </div><div>Point taken=
. I am actually learning groovy and gremlin (but have no experience in Java=
at all) and trying to apply it to problems I am trying to solve, but =
perhaps I have bitten off a bit more than I can chew at the moment.</div><d=
iv> </div><div>I am currently going through Groovy in Action (excellen=
t book by the way, for those who want to get started with groovy but have n=
ot Java experience). Previously I was reading Programming Groovy (pragmatic=
programmers), but I felt that there is too much emphasis on knowing Java (=
Groovy in Action is much more neutral)).</div><div> </div><div>I do wi=
sh that there is a book for gremlin though (perhaps one of you guys could w=
rite one :) ). Currently I am relying on the wiki on Github and gremlindocs=
.com, but for a beginner, there isn't really a way to ease myself into grem=
lin so to speak.</div><div> </div><div>Cheers :)</div><div><br>On Satu=
rday, November 10, 2012 1:58:48 AM UTC+11, Stephen Mallette wrote:</div><bl=
ockquote style=3D"margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left=
-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: soli=
d;" class=3D"gmail_quote">You can filter in the reduce closure pretty easil=
y so that you don't
<br>have to post-process the outputted map from the groupBy:
<br>
<br>gremlin> g.V.out.groupBy{<a href=3D"http://it.name" target=3D"_blank=
">it.name</a>}{<a href=3D"http://it.in" target=3D"_blank">it.in</a><wbr>}{i=
t.unique().findAll{i ->
<br>i.age > 30}.name}.cap
<br>=3D=3D>{lop=3D[josh, peter], ripple=3D[josh], josh=3D[], vadas=3D[]}
<br>
<br>In this way you can evaluate each item extracted into the grouped
<br>value of the map. I guess you could do some filtering in the valu=
e
<br>closure as well:
<br>
<br>gremlin> g.V.out.groupBy{<a href=3D"http://it.name" target=3D"_blank=
">it.name</a>}{it.<wbr>in.filter{i -> i.age >
<br>30}.name}{it.toList().unique()<wbr>}.cap
<br>=3D=3D>{lop=3D[josh, peter], ripple=3D[josh], josh=3D[], vadas=3D[]}
<br>
<br>I think you should spend some time understanding the underpinnings of
<br>groovy/closures a bit and how it relates back to the gremlin. It
<br>sounds like you're attempting to do some less than trivial things in
<br>your work. Consider taking some time away from these very complex
<br>problems and focus on just getting things to work. Once working, =
then
<br>you can refine and improve. Just a suggestion :)
<br>
<br>Stephen
<br>
<br>On Thu, Nov 8, 2012 at 8:41 PM, jstorm <<a href=3D"javascript:" targ=
et=3D"_blank" gdf-obfuscated-mailto=3D"-ohwRM5PDrkJ">infec...@gmail.com</a>=
> wrote:
<br>> P.S. I understand that 1 possible solution is to filter out the ve=
rtices
<br>> that do not match datetime requirements before we do the group by,=
but let's
<br>> say I have the following:
<br>>
<br>> date: 11 November 9:00 PM
<br>> object: v[1]
<br>>
<br>> date: 15 November 8:00 PM
<br>> object: v[1]
<br>>
<br>> Let's say I only want events before 12 November 9:00 PM.
<br>>
<br>> The above 2 will be grouped together, but because one of the verti=
ces falls
<br>> outside the time range, so all vertices that can be grouped due to=
v[1]
<br>> should be discarded and not grouped. If I do the filtering before =
the
<br>> grouping, then I am unable to implement this.
<br>>
<br>> Any input appreciated :)
<br>>
<br>>
<br>> On Friday, November 9, 2012 12:11:53 PM UTC+11, jstorm wrote:
<br>>>
<br>>> I have a few simple vertices with some properties I would like=
to groupBy
<br>>> on.
<br>>>
<br>>> These vertices also have a property called date which contains=
a date
<br>>> generated using new Date().
<br>>>
<br>>> What I would like to do is to groupBy on a property called tar=
get and if
<br>>> the date of any of the vertices that we are grouping on does n=
ot meet a
<br>>> certain condition, we do not to the grouping for them at all a=
nd they will
<br>>> not appear in the final grouped results.
<br>>>
<br>>> Looking at the wiki for groupBy, I can use the key and value f=
unctions to
<br>>> groupBy and to determine what will get pushed to the final col=
lection
<br>>> outputted by groupBy.
<br>>>
<br>>> Besides iterating through the output of groupBy and removing r=
esults where
<br>>> the child vertices do not match the date requirements (i think=
this could be
<br>>> quite expensive), is there a way to do this filtering during t=
he groupBy
<br>>> process?
<br>>>
<br>>> Cheers :)
<br>>
<br>> --
<br>>
<br>>
<br></blockquote>
------=_Part_8_10412622.1352595823403--
------=_Part_7_28791508.1352595823402--