[Pipes/Gremlin] Handling Trees... RFC.

74 views
Skip to first unread message

Marko Rodriguez

unread,
Jul 4, 2011, 10:07:28 AM7/4/11
to gremli...@googlegroups.com
Hi,

I believe it was James Thornton that had the idea of supporting tree data structures.

The work that Pangloss did to add Pipe.getPath() to Pipes has opened up a lot of potential. However, turning a linear path into a tree is not something easily done without some trickery. As such, a tree data structure (like how table was birthed from paths) might be a good thing.

Is something like this what is desired?

gremlin> g.v(1).out('created').in('created').tree
==>(v[1],((v[3], (v[4],v[6])),(v[4],(v[5],v[6])))

?uggle. Super gross.

In short:

[root, children*] in a recursive fashion.


http://markorodriguez.com

Pierre De Wilde

unread,
Jul 4, 2011, 11:34:06 AM7/4/11
to gremli...@googlegroups.com
Hi Marko,

Recursive [root, children*] seems OK, but I don't understand your example:

gremlin> g.v(1).out('created').in('created').tree
==>(v[1],((v[3], (v[4],v[6])),(v[4],(v[5],v[6])))

I am expecting:

gremlin> g.v(1).out('created').in('created').tree
==>[v[1],[v[3],[v[6],v[4]]]]

Right? Closures could also be used:

gremlin> g.v(1).out('created').in('created').tree{it.name}{it.name}{it.name}
==>[marko, [lop, [peter, josh]]]


or with repeatable closures (as with paths):

gremlin> g.v(1).out('created').in('created').tree{it.name}
==>[marko, [lop, [peter, josh]]]


Repeatable closures should be used consistently. For instance:

gremlin> t = new Table()                                                                  
gremlin> g.v(1).out('knows').as('x').out('created').as('y').table(t){it.name}        
The number of columns is 2 and the number of closures is 1

Thoughts?
Pierre

2011/7/4 Marko Rodriguez <okram...@gmail.com>

Marko Rodriguez

unread,
Jul 4, 2011, 1:25:56 PM7/4/11
to gremli...@googlegroups.com
I concur with what you are saying. For some reason, trees hurt my head. Need to think.

And yes---repeatable (round-robin) closures are a good thing. I will do that now for Table.

Thanks for the ideas,
Marko.

Marko Rodriguez

unread,
Jul 4, 2011, 2:37:37 PM7/4/11
to gremli...@googlegroups.com
Hi,

Repeatable closures should be used consistently. For instance:

gremlin> t = new Table()                                                                  
gremlin> g.v(1).out('knows').as('x').out('created').as('y').table(t){it.name}        
The number of columns is 2 and the number of closures is 1

Done.

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).out('knows').as('x').out('created').as('y').table(new Table()){it.name}.cap.next()
==>[x:josh, y:ripple]
==>[x:josh, y:lop]


Committed. Thanks.

Marko.

Daniel Quest

unread,
Jul 4, 2011, 3:44:20 PM7/4/11
to gremli...@googlegroups.com
I agree that this is the tree I would expect

gremlin> g.v(1).out('created').in('created').tree
==>[v[1],[v[3],[v[6],v[4]]]]

However, this should also be fine right?
==>[v[1],[v[3],[v[4],v[6]]]]

Won't there be a lot of equivalent trees for the same .tree? -- unless the walker is very deterministic in the path it chooses; is it?

Not sure of the use case at this moment was it to simplify a graph?  Will these trees need to be compared?  Silly question... but is it possible to just dump the edges as the traverser crawls the graph, pipe the edges into some toolkit(does JUNG work?), and then extract a tree/spanning tree?

Wish I could be of more help
-Daniel

James Thornton

unread,
Jul 4, 2011, 5:20:51 PM7/4/11
to gremli...@googlegroups.com
Hi Marko -

I imagine trees can get crazy at the Pipe level so a big thanks for digging down into this.

The example Pierre provided is inline with how I had envisioned the query results:

gremlin> g.v(1).out('created').in('created').tree
==>[v[1],[v[3],[v[6],v[4]]]]

And the ability to sort on element properties will help the query behave more predictably. 

- James

James Thornton

unread,
Jul 4, 2011, 5:41:15 PM7/4/11
to gremli...@googlegroups.com
Daniel -

A common use case for a tree query would be returning a thread of nested comments without having to make multiple queries (akin to Oracle's connect by clause http://philip.greenspun.com/sql/trees.html). 

This came up when I was working on a tutorial on how to build Web applications with graph databases, and one of the tutorials goes through the steps for building a blog. 

The query to return nested comments was a little unwieldy so Marko graciously offered to look into how to simplify it at the Pipe level by creating a tree-based Pipe (https://groups.google.com/d/topic/gremlin-users/XfwfrOQxlyw/discussion). 

Gremlin by default does a depth-first traversal (https://github.com/tinkerpop/gremlin/wiki/Depth-First-vs.-Breadth-First), and applying sort() to elements could help make the results more predicable. 

- James

Daniel Quest

unread,
Jul 4, 2011, 7:06:25 PM7/4/11
to gremli...@googlegroups.com
ah yah, I remember the discussion....  I see the use case you guys are talking about. 

I suppose you could also solve it by changing (augmenting) your data model?  I am only guessing I don't really have a solution at the moment.
-Daniel

James Thornton

unread,
Jul 4, 2011, 8:18:10 PM7/4/11
to gremli...@googlegroups.com
In graphs you don't have to involve properties to model a tree, like you would with a relational database. Trees are constructed with just nodes and edges:

Comment1 -- comment_on --> BlogEntry0 

Comment2 -- comment_on --> Comment1

Comment3 -- comment_on --> Comment2
Comment4 -- comment_on --> Comment2

So to get all BlogEntry0's comment thread, you would do something like this...

gremlin> g.v(0).in('comment_on').loop(1){it.loops < 1000}
==>[comment1, [comment2, [comment3, comment4]]]

The query results will contain a list of all BlogEntry0's comments:

- comment1 is at the top of the tree, and it has one child (comment2). 
- comment2 has two children -- comment3 and comment4.
- comment3 and comment4 have no children -- they are leaf nodes (http://en.wikipedia.org/wiki/Leaf_node)

James



Marko Rodriguez

unread,
Jul 4, 2011, 8:21:04 PM7/4/11
to gremli...@googlegroups.com
Hey,

I think I have a solution. Basically, it about embedded sets where no sets in the structure can contain the same objects. Sound right?

I'm trying to work out a Tree object (like Table). Still a little awkward right now...will see how it goes.

Marko.

James Thornton

unread,
Jul 4, 2011, 10:02:29 PM7/4/11
to gremli...@googlegroups.com
Embedded sets should work with the comment-thread case where duplicates don't exist, but it's possible to have trees that have duplicates, such as in an organizational chart where a person reports to two different people.  

Daniel Quest

unread,
Jul 4, 2011, 11:08:53 PM7/4/11
to gremli...@googlegroups.com
Hey James,

Yah, I follow.  I was not thinking relational, I was thinking of adding nodes/properties of other types so when gremlin runs over your structure depth first, the path that results is a tree of the type you want.  The advantage is that you could consume trees larger than memory.  The disadvantage of course is that you have to maintain a bunch of nodes and links that are artificial.  
Dan

Sent from my iPod
Reply all
Reply to author
Forward
0 new messages