(Tinkerpop3) How to eliminate lambda step from query

213 views
Skip to first unread message

Krzysztof Genge

unread,
Jul 12, 2016, 4:28:14 AM7/12/16
to Gremlin-users
Hi 

In tinkerpop3 documentation (http://tinkerpop.apache.org/docs/3.0.1-incubating/) it is advised to not use lambda expressions if possible.
My use case is category graph that contains tree structured category hierarchy as vertices. Edges linking categories are labeled with 'HasParent' value and contain 'position' property which is an integer used to order subcategories (The sort is performed in application logic as Tree object extends HashMap that is not available to store order of entries)

To retrieve a category and linking edges with all its subcategories recursively I run query:

tenantGraph.traversal()
   
.V(START_CATEGORY_VERTEX_ID)
   
.emit()
   
.repeat(
      __
.inE("hasParent")
     
.map(traverser -> traverser.get().outVertex())
   
)
   
.until(traverser -> stopTraverser(traverser, treeDepth))
   
.tree().next();  




This query returns org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree which contains vertices and edges that link them.

If I get rid of the lambda map step:


tenantGraph.traversal()
 
.V(parentCategoryVertices.toArray())
 
.emit()
 
.repeat(
     __
.inE("hasParent")
     
.outV()
 
)
 
.until(traverser -> stopTraverser(traverser, treeDepth))
 
.tree().next();

Only the 'Category' vertices without edges are returned in tree structure.

Do anyone have an idea how can I transform the first query to not use the map step and retrieve the same result?

Bonus question: Is there a better way to get ordered tree structured results than using tree operation and perform the sort on client side?

thanks in advance,
Krzysztof




Daniel Kuppitz

unread,
Jul 12, 2016, 5:25:22 AM7/12/16
to gremli...@googlegroups.com
Getting rid of the lambda is easy:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property(id, "a").as("a").
gremlin>   addV().property(id, "aa").as("aa").
gremlin>   addV().property(id, "ab").as("ab").
gremlin>   addV().property(id, "ac").as("ac").
gremlin>   addV().property(id, "aaa").as("aaa").
gremlin>   addV().property(id, "aab").as("aab").
gremlin>   addE("hasParent").from("aaa").to("aa").property("position", 1).
gremlin>   addE("hasParent").from("aab").to("aa").property("position", 2).
gremlin>   addE("hasParent").from("aa").to("a").property("position", 1).
gremlin>   addE("hasParent").from("ab").to("a").property("position", 2).
gremlin>   addE("hasParent").from("ac").to("a").property("position", 3).iterate()
gremlin>
gremlin> maxDepth = 3
==>3

gremlin> g.V("a").emit().repeat(inE("hasParent").as("e").outV()).times(maxDepth).tree().by(id).by("position").next()
==>a={1={aa={1={aaa={}}, 2={aab={}}}}, 2={ab={}}, 3={ac={}}}

However, the order of entries in a tree structure is a problem. Unfortunately the order, in which elements are emitted, is not preserved. I think you have 2 options:
  1. order in your application code
  2. don't use tree structures

To give an example for option #2:

gremlin> g.V("a").emit().repeat(inE("hasParent").order().by("position", incr).outV()).times(2).path().by(id).by("position")
==>[a]
==>[a, 1, aa]
==>[a, 1, aa, 1, aaa]
==>[a, 1, aa, 2, aab]
==>[a, 2, ab]
==>[a, 3, ac]
gremlin> g.V("a").emit().repeat(inE("hasParent").order().by("position", decr).outV()).times(2).path().by(id).by("position")
==>[a]
==>[a, 3, ac]
==>[a, 2, ab]
==>[a, 1, aa]
==>[a, 1, aa, 2, aab]
==>[a, 1, aa, 1, aaa]

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/abac3f2e-8cbf-413a-a0ac-26e462305033%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Krzysztof Genge

unread,
Jul 12, 2016, 7:46:55 AM7/12/16
to Gremlin-users
Hi Daniel,

Thank you very much for your answer, it works. I had no idea that adding .as() step on its own (without select() or match()) could change the results of tree operation. 

Regarding your second answer. It seems that sorting on the titan side will result in better performance, but if the category tree structure is deep (eg. 10 levels) the data duplication in the list of paths will be significant.

Could you advice if using paths (which will have to be transformed to tree structure on client side) is better than sorting the tree structure returned by tree() operation ?

regards,
Krzysztof

Daniel Kuppitz

unread,
Jul 12, 2016, 10:03:56 AM7/12/16
to gremli...@googlegroups.com
Could you advice if using paths (which will have to be transformed to tree structure on client side) is better than sorting the tree structure returned by tree() operation ?

Probably not. As you've already realized, paths would have a much larger response. Also your application will have to do some extra work anyways - either sort or build a tree. Hence I'd probably use tree() and sort within the application.

A 3rd option just came to my mind. You could also implement your own SortedTree class:

public class SortedTree extends org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree {
    // TODO: override (almost?) all methods
}

The implementation could use lists of tuples/pairs instead of maps. That alone should make trees sortable (entries in the result will appear in the same order in which they were emitted)..

...and then use it like this:

g.withSideEffect("t", new SortedTree()).V("a").emit().repeat(inE("hasParent").as("e").outV()).times(maxDepth).tree("t").by(id).by("position").cap("t").next()

Cheers,
Daniel


Krzysztof Genge

unread,
Jul 12, 2016, 10:33:37 AM7/12/16
to Gremlin-users
thanks for the 3rd option if I will encounter performance problems with client side sorting, will definitely give it a try

regards,
Krzysztof

Daniel Kuppitz

unread,
Jul 12, 2016, 11:21:27 AM7/12/16
to gremli...@googlegroups.com
I overwrote TinkerPop's Tree implementation, just to see if we can make a order preserving Tree. We can:

gremlin> g.V("a").emit().repeat(inE("hasParent").order().by("position", incr).outV()).times(maxDepth).tree().by(id).by("position").next()
==>a=[1=[aa=[1=[aaa=[]], 2=[aab=[]]]], 2=[ab=[]], 3=[ac=[]]]
gremlin> g.V("a").emit().repeat(inE("hasParent").order().by("position", decr).outV()).times(maxDepth).tree().by(id).by("position").next()
==>a=[3=[ac=[]], 2=[ab=[]], 1=[aa=[2=[aab=[]], 1=[aaa=[]]]]]

However, it's now a LinkedHashSet backed by a Map, which means that it now requires a lot more memory. Here's my quick & dirty implementation:


Let's see if we can get Marko's opinion on it.

Cheers,
Daniel


Marko Rodriguez

unread,
Jul 13, 2016, 12:01:24 AM7/13/16
to gremli...@googlegroups.com
Hi,

Tree should be an interface and we should have:

OrderedTree
SimpleTree
BlahTree
CheckMeOutIKnowHowToWriteEmailsTree
It is bad that Tree is just a class. I haven’t played with Tree since early TinkerPop 1.0 days … that code was just copied over. I bet there are some cool things we could do with Tree and tree() if we took some time to think about it… 

Marko.

Stephen Mallette

unread,
Jul 13, 2016, 11:55:34 AM7/13/16
to Gremlin-users
I was pretty sure that we once had an issue for making Tree an interface, but I can't find it now - I created this:


I do have this sneaking suspicion that there was a reason why we didn't move forward on that issue - but now i don't remember what it was.

Reply all
Reply to author
Forward
0 new messages