Google Groups

Re: gremlin transitive closure 2nd


Marko A. Rodriguez Jul 6, 2011 7:20 AM
Posted in group: Gremlin-users
Hi Pierre,

You have stumbled upon another dirty secret. LoopPipe, when complete has an ordering of depth-first, but when in loop the ordering is breadth-first. Here is why.

Loop maintains two iterators --- the incoming pipe and an internal "loop queue." A random example below:

gremlin> g.v(1).both.in.out.loop(2){true}.toString() ==>[BothPipe, LoopPipe[[InPipe, OutPipe]]]

The BothPipe iterator and a Queue (LinkedList) are the Iterators going into LoopPipe. When the last OutPipe emits its value and the loop closure says "loop it" (i.e. returns true), the object is taken from the OutPipe and put into the Queue. The Queue has priority over the BothPipe (a desire to keep it lazy/memory efficient). However, that queued object simply can't be the next thing operated on by InPipe/OutPipe as InPipe/OutPipe still have some elements in them. So the LoopPipe segment needs to flush the remaining objects (filling the queue as appropriate). Finally, at which point, the Queue is popped for its next object (not BothPipe). In essence, this creates breadth-first behavior in the loop line, but depth-first outside the loop.

gremlin> g.v(1).out.loop(1){it.loops < 3} == g.v(1).out.out ==>true
 
What would be the more natural, but (at the time) hard way to code it would be to have the queue immediately run without a "flush" of the LoopPipe. ... Code-wise, it just got overly complicated and as such, LoopPipe is the way it is.

For traversals where there is no state saved internal to the LoopPipe, all is good. But when you use sideEffect{x.add(it}} within a LoopPipe, then this behavior becomes apparent.

In short, there is an impurity in LoopPipe. AggregatorPipe has been rectified, but still does not work correctly with LoopPipe because of this stateful-ness aspect you bring up. There is more to the story with LoopPipe "bundles" (metadata saved in the Queue with the object -- it.object, it.loops, it.path) that complicate things further.

Anywho, a purer (depth-first all the way) LoopPipe that is efficient would be an excellent contribution to Gremlin.

Thanks for the honest eyes Pierre,
Marko.


On Jul 6, 2011, at 5:56 AM, Pierre De Wilde wrote:

Hi Marko,

Thank you for this update.

I modified your example a little bit  (using HashMap instead of Set) to see exactly how the collection was filled:

gremlin> x = []
gremlin> c = 0
==>0
gremlin> g.v(1).out.aggregate(x).loop(2){c++ < 3} >> -1
gremlin> x
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[3]

Breadth-first or depth-first? Cannot figure out. So, rewrote the example like this:

gremlin> x = []                                        
gremlin> c = 0                                         
==>0
gremlin> g.v(1).out('knows','created').aggregate(x).loop(2){c++ < 3} >> -1
gremlin> x
==>v[2]
==>v[4]
==>v[3]
==>v[5]
==>v[3]

OK, it's breadth-first traversal. So, what about depth-first traversal? Tried this:

gremlin> x = []                                                           
gremlin> c = 0                                                            
==>0
gremlin> g.v(1).out('knows','created').sideEffect{x.add(it)}.loop(2){c++ < 3} >> -1
gremlin> x
==>v[2]
==>v[4]
==>v[3]
==>v[5]
==>v[3]

Still breadth-first traversal. Tried this:

gremlin> x = []                                                                    
gremlin> g.v(1).out('knows','created').sideEffect{x.add(it)}.out('knows','created').sideEffect{x.add(it)} >> -1
gremlin> x
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]

Now depth-first traversal.

? I don't understand the difference between the last 2 examples.

Thanks,
Pierre


2011/7/6 Marko Rodriguez <okram...@gmail.com>
Hi,

So, I've pushed/deployed a new version of AggregatorPipe to GitHub/TinkerPop Maven2. This version is truly a SideEffectPipe in that the incoming objects are the same as the outgoing objects with not alteration to their order, number, etc. As such, the internal aggregation collection is NOT the iterator of the pipe. There are not two internal collections--the user provided collection (aggregate) and a queue that is drained on next(). Moreover, getPath() works correctly as there is yet another queue for the paths.

Finally, AggregatorPipe no longer "locks" when its incoming objects have been drained. The reason this is good is so that when a loop() throws an object back to the beginning, there are still more incoming objects. This works great now save one small caveat that I haven't found a solution to: it.loops in the closure of loop() does not behave correctly. I need to think on this still to find a solution.

As Daniel (and others) have realized, AggregatorPipe is a different kind of beast as it turns a depth-first engine into a breadth-first engine. Moreover, it really breaks from the Pipes model and causes a whole lot of problems that make them self apparent in various situations -- looping, back tracking, paths, etc. We are close to getting it squared away.

I've pushed what I have to Gremlin 1.2-SNAPSHOT. Please feel free to git pull and play.

Thanks,
Marko.


On Jul 5, 2011, at 5:53 PM, Daniel wrote:

Hi Marko,
many thanks for your explanations. After having a look at the
implementation of AggregatorPipe [0] I think I can see now what
happens. 'aggregate' is much more powerful and different from what I
expected after reading its step description [1] on the github wiki. 

You may want to adjust the description from:
'emits input, but adds input to a collection'
to something along the line:
'adds input to collection, emits collection'

Regarding your side note. It definitely matters if I put the 'back(1)'
after 'aggregate' and now it even seems to make sense to me somehow.
Although 'aggregate' is locked from the second iteration I don't care
by just jumping one step back and go on with the previous iterator..
Okay, I have no clue how back(1) and loop works :) but I will
definitely have a look at it.

Here is a sample session with rexter (cloned today) on the
gratefulgraph:

// using back(1)
visited = [] as Set
result = [] as Set
g.V[['type':'song', 'name':'TELL
MAMA']].as('X').aggregate(result).back(1).outE('followed_by').except(visited).aggregate(visited).back(1).inV.loop('X')
{true}
result.size()
==>338

// without back(1)
visited = [] as Set
result = [] as Set
g.V[['type':'song', 'name':'TELL
MAMA']].as('X').aggregate(result).outE('followed_by').except(visited).aggregate(visited).back(1).inV.loop('X')
{true}
result.size()
==>1


Now after having realized that aggregate behaves different I tried
this:

visited = [] as Set
result = [] as Set
oldCnt = 0
g.V[['type':'song', 'name':'TELL
MAMA']].as('X').sideEffect{result.add(it)}.outE('followed_by').except(visited).sideEffect{visited.add(it)}.inV.loop('X')
{r = (oldCnt != visited.size()); oldCnt = visited.size(); r}
result.size()
==>338

Does it look like a reasonable approach?

Thanks
Daniel


[0] https://github.com/tinkerpop/pipes/blob/master/src/main/java/com/tinkerpop/pipes/sideeffect/AggregatorPipe.java
[1] https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps