Converting lists and maps into pipes

144 views
Skip to first unread message

Paul Jackson

unread,
Jul 6, 2011, 6:42:00 PM7/6/11
to Gremlin-users
I've been playing with the new Groovy-based Gremlin and I like it a
lot. One thing I've learned is that it is very important to know the
class of the object you are acting upon. As an exercise, I am
attempting to write a script that takes the example2 graph and find
pairs of songs sung_by the same artist that follow each other. I am
not to that step yet. In the example below, I am collecting statistics
on how many songs each artist performed (sung_by). While composing
this e-mail I ended up answering my own question, but I figured the
may be of general interest, so I am posting anyway.....

// Who are the artists
g.V[['type':'artist']].name

// Who sang the most songs
m=[:]
g.V[['type':'song']].out('sung_by').name.groupCount(m)
m.sort{a,b -> b.value <=> a.value}

// Filter out one-hit wonders - note the sort returns a map, not a
pipe
m.sort{a,b -> b.value <=> a.value}.findAll{it.value > 1} // The Groovy
way, which returns another map

// I discovered that I could accomplish the same thing with a filter
if I used the '_' step
m.sort{a,b -> b.value <=> a.value}._{it.value > 1} // The Gremlin way,
which returns a pipe

// While I am at it, how about a histogram of the values in map m
i=[:]
// If I have a pipe, it is easy
m.sort{a,b -> b.value <=> a.value}._{it.value >
2}.transform{it.value}.groupCount(i)
// But if I do not perform the filter using the Gremlin method, then I
need a way to transform my List to a pipe
m.sort{a,b -> b.value <=> a.value}.findAll{it.value >
2}.collect{it.value}.groupCount(i) // throws exception
// And since I want the full histogram, I don't want the findAll
m.sort{a,b -> b.value <=> a.value}.collect{it.value}.groupCount(i) //
throws exception
// The simplest way I could find to do this was:
m.sort{a,b -> b.value <=>
a.value}.collect{it.value}._{true}.groupCount(i)
// A closure that returns the 'it' works as well
m.sort{a,b -> b.value <=>
a.value}.collect{it.value}._{it}.groupCount(i)
// However, both if these solutions insert an extra filter pipe into
the flow:
println m.sort{a,b -> b.value <=>
a.value}.collect{it.value}._{it}.groupCount(i)
[IdentityPipe, ClosureFilterPipe, GroupCountClosurePipe]

// I do not understand why I must add the filter
m.sort{a,b -> b.value <=>
a.value}.collect{it.value}._.groupCount(i) // Why does this throw an
exception
Exception evaluating property '_' for java.util.ArrayList, Reason:
groovy.lang.MissingPropertyException: No such property

// I found the answer; the identity pipe step ('_'), if not followed
by a pair of braces must be followed by a pair of parentheses.
println m.sort{a,b -> b.value <=>
a.value}.collect{it.value}._().groupCount(i)
[IdentityPipe, GroupCountClosurePipe]

// FWIW, you can optionally perform the identity earlier and then use
the transform step to retrieve the map values - maybe this could be
faster if pipes were executed in parallel?
m.sort{a,b -> b.value <=>
a.value}._().transform{it.value}.groupCount(i)

Some final questions: Would it be possible to teach gremlin that a
lone _ is really a _() (perhaps through the dynamic programming
mechanism in Groovy that allows methods to look like properties)?
Also, is there a better way to do what I was doing? Care to take a
stab at the artist-sings-two-songs-in-a-row problem?

Thanks,
Paul Jackson

Marko Rodriguez

unread,
Jul 6, 2011, 8:06:53 PM7/6/11
to gremli...@googlegroups.com
Hey Paul,

> I've been playing with the new Groovy-based Gremlin and I like it a
> lot.

Nice. Yea, old Gremlin seems ages ago now. Glad you made the switch.

> One thing I've learned is that it is very important to know the
> class of the object you are acting upon.

Super important. In general, _() is a way to turn any object into the start of a Pipeline.

> // Who are the artists
> g.V[['type':'artist']].name

Cool. Though, if you have an index, you can do:

g.idx(T.v)[[type:'artist']].name // much faster than a linear run through all vertices

Moreover, you don't need [['type':...]], you can do [[type:....]]. Groovy assumes the key to be a string if not specified.

> // Who sang the most songs
> m=[:]
> g.V[['type':'song']].out('sung_by').name.groupCount(m)
> m.sort{a,b -> b.value <=> a.value}

Cool. Again, g.idx() would be faster.

> // Filter out one-hit wonders - note the sort returns a map, not a
> pipe
> m.sort{a,b -> b.value <=> a.value}.findAll{it.value > 1} // The Groovy
> way, which returns another map

That works....

> // I discovered that I could accomplish the same thing with a filter
> if I used the '_' step
> m.sort{a,b -> b.value <=> a.value}._{it.value > 1} // The Gremlin way,
> which returns a pipe

You could do that, or your could do the "real" Gremlin way:

g.idx(T.v)[[type:'song]].out('sung_by'){it.in('sung_by').count() > 1}.name.groupCount(m)

In short, don't insert into the map anyone with less than 2 songs. However, probably less efficient given the double check to 'sung_by'. You can be verbose/explicit with:

g.idx(T.v)[[type:'song]].out('sung_by').filter{it.in('sung_by').count() > 1}.name.groupCount(m)

> // While I am at it, how about a histogram of the values in map m
> i=[:]
> // If I have a pipe, it is easy
> m.sort{a,b -> b.value <=> a.value}._{it.value >
> 2}.transform{it.value}.groupCount(i)
> // But if I do not perform the filter using the Gremlin method, then I
> need a way to transform my List to a pipe
> m.sort{a,b -> b.value <=> a.value}.findAll{it.value >
> 2}.collect{it.value}.groupCount(i) // throws exception
> // And since I want the full histogram, I don't want the findAll
> m.sort{a,b -> b.value <=> a.value}.collect{it.value}.groupCount(i) //
> throws exception
> // The simplest way I could find to do this was:
> m.sort{a,b -> b.value <=>
> a.value}.collect{it.value}._{true}.groupCount(i)

Simply do _(). If its not a pipe (some other object), then _() is required. If it is a pipe, then simply _ can be used.

> // A closure that returns the 'it' works as well
> m.sort{a,b -> b.value <=>
> a.value}.collect{it.value}._{it}.groupCount(i)

Its not returning 'it'. Its saying 'it != null', thus true. Same as {true}. This is known as "Groovy Truth"

> // However, both if these solutions insert an extra filter pipe into
> the flow:
> println m.sort{a,b -> b.value <=>
> a.value}.collect{it.value}._{it}.groupCount(i)
> [IdentityPipe, ClosureFilterPipe, GroupCountClosurePipe]

Again, just do _().

> // I do not understand why I must add the filter
> m.sort{a,b -> b.value <=>
> a.value}.collect{it.value}._.groupCount(i) // Why does this throw an
> exception
> Exception evaluating property '_' for java.util.ArrayList, Reason:
> groovy.lang.MissingPropertyException: No such property
>
> // I found the answer; the identity pipe step ('_'), if not followed
> by a pair of braces must be followed by a pair of parentheses.
> println m.sort{a,b -> b.value <=>
> a.value}.collect{it.value}._().groupCount(i)
> [IdentityPipe, GroupCountClosurePipe]
>
> // FWIW, you can optionally perform the identity earlier and then use
> the transform step to retrieve the map values - maybe this could be
> faster if pipes were executed in parallel?
> m.sort{a,b -> b.value <=>
> a.value}._().transform{it.value}.groupCount(i)

There you go. _() basically does this:

IdentityPipe.setStarts(previousObject.iterator()) /// Object.iterator() is a metaMethod provided by Groovy to coerce any object into an iterator.

> Some final questions: Would it be possible to teach gremlin that a
> lone _ is really a _() (perhaps through the dynamic programming
> mechanism in Groovy that allows methods to look like properties)?

Adding methods/properties to classes can have terrible rippling effects to other Groovy libraries the user might be using. As such, I've only overloaded java.lang.Object with one metaMethod -- Object._(). Its the way of turning any object into a pipeline.

> Also, is there a better way to do what I was doing? Care to take a
> stab at the artist-sings-two-songs-in-a-row problem?

g.idx(T.v)[[type:'song']].out('sung_by').sideEffect{x = it}.back(2).out('followed_by').out('sung_by'){x == it}.name.groupCount(m)

That says:
1. For every song vertex.
2. Determine who sung it.
3. Save that person to variable x.
4. Jump back to the song (as we were at the person)
5. What song's follow that song.
6. Filter out those songs not sung by x.
7. Index the person into m.

Thus, each count in m is for a person singing a song twice in a row.

Here is a variation of the same computation:

Assuming Gremlin 1.1:
g.idx(T.v)[[type:'song']].as('song').out('sung_by').sideEffect{x = it}.back('song').out('followed_by').out('sung_by'){x == it}.name.groupCount(m)

Assuming Gremlin 1.2-SNAPSHOT:
g.idx(T.v)[[type:'song']].as('song').out('sung_by').sideEffect{x = it}.back('song').out('followed_by').out('sung_by'){x == it}.groupCount(m){it.name}

Assuming Gremlin 1.2-SNAPSHOT and being "filter explicit":
g.idx(T.v)[[type:'song']].as('song').out('sung_by').sideEffect{x = it}.back('song').out('followed_by').out('sung_by').filter{x == it}.groupCount(m){it.name}

Hope that helps. That was fun. Thanks for your mind dump.

Marko.

http://markorodriguez.com


>
> Thanks,
> Paul Jackson

Paul Jackson

unread,
Jul 6, 2011, 9:45:44 PM7/6/11
to Gremlin-users
Thanks, Marko. This was very helpful. I attempted to take this a
couple steps further:
First, I put the results into a Table
t=new Table()
g.idx(T.v)
[[type:'song']].name.as('previous').back(2).out('sung_by').sideEffect{x
=
it}.name.as('artist').back(3).out('followed_by').name.as('subsequent').back(2).out('sung_by')
{x == it}.name.table(t).filter{false}
or
g.idx(T.v)
[[type:'song']].name.as('previous').back(2).out('sung_by').sideEffect{x
=
it}.name.as('artist').back(3).out('followed_by').name.as('subsequent').back(2).out('sung_by')
{x == it}.name.table(new Table()).cap()

Next I referred to nodes by their name property. This required the use
of both the as() pipe and the back() pipe in combination. I was
surprised to see that I only needed to back trace one step. I would
have expected 3 (one for back(), one for as() and one for
property('name')).
g.idx(T.v)
[[type:'song']].name.as('previous').back(1).out('sung_by').sideEffect{x
=
it}.back(2).out('followed_by').name.as('subsequent').back(1).out('sung_by')
{x == it}.name.as('artist').table(new Table()).cap()

I guess this is because the as() pipe wraps the prior step, so one
step back pops you out of the inner pipe?
gremlin> println g.idx(T.v)
[[type:'song']].name.as('previous').back(1).out('sung_by').sideEffect{x
=
it}.back(2).out('followed_by').name.as('subsequent').back(1).out('sung_by')
{x == it}.name.as('artist').table(new Table()).cap()
[IdentityPipe, BackFilterPipe[[AsPipe(previous)[PropertyPipe(name)]]],
BackFilterPipe[[OutPipe(sung_by), ClosureSideEffectPipe]],
OutPipe(followed_by), BackFilterPipe[[AsPipe(subsequent)
[PropertyPipe(name)]]], OutPipe(sung_by), ClosureFilterPipe,
AsPipe(artist)[PropertyPipe(name)], SideEffectCapPipe[TablePipe]]
==>null

Thanks again for the solution.
Reply all
Reply to author
Forward
0 new messages