[TinkerPop3] Gremlin OLAP RFC -- I'm covered in Gremlins!

247 views
Skip to first unread message

Marko Rodriguez

unread,
Dec 21, 2013, 2:55:23 PM12/21/13
to gremli...@googlegroups.com
Hi,

This is the first non-Blueprints3 RFC and it begins the thread of discussion around Gremlin OLAP.

Gremlin OLAP is a global graph processor that uses the new Blueprints3 GraphComputer framework. The algorithm relies on the binomial coefficient calculation proposed by Blasé Pascal many tides ago, but extended for arbitrary dimensions. The algorithm is described in more detail at this location: http://thinkaurelius.com/2013/06/12/loopy-lattices-redux/ . This model of graph traversing is breadth-first and much more efficient when doing global processing than the the depth-first pipeline Gremlin model of common knowledge. In short, Gremlin OLAP is good for global graph analytics and Gremlin OLTP is good for local neighborhood queries. The beauty of Gremlin OLAP in TinkerPop3 is that it simply uses the same Gremlin OLTP pipelines to function and execute its behavior. As such, the GremlinVertexProgram is simple.


The implementation, as it currently stands, does NOT support path-based calculations (e.g. path(), back(), asMap-access, etc.) but will once the basic counter mechanism is flushed out and fully-tested. I have a simply test case that lets me explore the behavior of Gremlin OLAP over the classic TinkerGraphFactory-graph.


As you can see, you simply submit a GremlinVertexProgram with a Gremlin pipeline to the Graph.compute() system and get back the ComputeResult.


The results of the computation submitted in the test case are below -- "Marko knows Josh with a  weight of 1.0".

---VERTICES---
1 > .
name:marko > .
age:29 > .
2 > .
name:vadas > .
age:27 > .
3 > .
name:lop > .
lang:java > .
4 > .
name:josh > 1
age:32 > .
5 > .
name:ripple > .
lang:java > .
6 > .
name:peter > .
age:35 > .
---EDGES---
11 > .
weight:0.4 > .
12 > .
weight:0.2 > .
7 > .
weight:0.5 > .
8 > .
weight:1.0 > .
9 > .
weight:0.4 > .
10 > .
weight:1.0 > .


Gremlin OLAP will subsume and extend the functionality currently provided by Faunus (http://faunus.thinkaurelius.com). With Gremlin OLAP, we will be able to run global graph processing jobs over any graph system, not just Titan (like Faunus currently does), but also TinkerGraph, Neo4j, OrientDB, Giraph, HAMA, DEX, RDF graphs, etc.

Your thoughts are more than welcome as we start to flush out this body of code,
Marko.

Daniel Quest

unread,
Dec 21, 2013, 9:25:36 PM12/21/13
to gremli...@googlegroups.com
This sounds awesome!

Sent from my iPad
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Marko Rodriguez

unread,
Jan 1, 2014, 1:19:50 PM1/1/14
to gremli...@googlegroups.com
Hello,

Gremlin OLAP is coming along nicely. I made some good headway this morning. In particular, counter propagation is fully functional for propagating gremlins through to vertices, edges, and properties. You can see the code here:


Its actually quite simple:

1. A counter Map<Object,Long> maintains how many gremlins are at each local object (i.e. vertex, edge, or properties of the vertex).
- The CounterMap is an unfortunate consequence of GraphComputer semantics -- you getXXX() from the previous BSP cycle and you setXXX() to the next BSP cycle.
- As such, you can not use the object property/annotation as a memory system during the life of the vertex program's iteration.
- This may come back to bite us when we move to implement sideEffect steps in Gremlin OLAP.
2. Receive all messages to this particular vertex (where, again, a vertex serves as a message hub for its properties, edges, and edge properties).
3. For each element that has gremlins at it, put it into the current pipe of the GremlinPipeline, iterate the pipe, and the output (end of the Pipe) has messages with counters propagated to it.
4. Update the properties and annotations of respective objects.

Again, what is nice (so far :) is that this is only 150 lines of code. The behavior is identical to Gremlin/Faunus save that we are using the Pipes from Gremlin OLTP and thus, no double-creation of the Gremlin language. 

What needs to happen next:

1. We need to be able to support non-graph object objects. E.g. out('knows').value('name') (the String value). 
- Right now this throws an exception as we can't "count" things that don't have properties or annotations.
2. We need to be able to support paths. E.g. out('knows').path.
- This is related to 1 but also necessary for teleportation-steps and will be "easy" as a GremlinMessage will be able to hold either a Path or a Long counter. (again, just like Faunus).
- There needs to be more work on Gremlin OLTP around Derrick Wiebe's (Pacer) contributions on turning on and off paths automagically via Pipeline introspection.
- Currently in Gremlin3 OLTP paths are always computed.

Once we have 1 & 2 above, we need to start thinking about how loop() works. With the new Gremlin architecture using Holder<> where the Pipe no longer has state, I believe this should fall out naturally.

Finally, you can see how this is all trigger via my "System.out.println()"-based TestCase.
Once we get 1 & 2 completed, we will move to having real test cases.

Enjoy!,
Marko.

Marko Rodriguez

unread,
Jan 4, 2014, 2:20:38 PM1/4/14
to gremli...@googlegroups.com
Hello,

I made some major progress on Gremlin OLAP over the last two days. The latest additions include:

path-step is functional
back-step is functional
select-step is functional
as-step is functional

The above were tricky problems because they all rely on "paths calculations" and if a path was, for example:

[v[1], v[2], vadas]

…then when you go to serialize v[1] and v[2], you could potentially serialize the whole object graph. Or, when communicating paths via messages, you could have v[1] & v[2] as "big objects" you are pushing around (big binary blobs or properties, edges, etc.). This not a big deal for single machine, in-memory object graph systems like TinkerGraph, but for a Hadoop-based implementation of GraphComputer, this would not be good. To rectify this situation, we now have MicroVertex, MicroEdge, and MicroProperty.

-- IGNORE THE "." AT THE END OF THE toString() METHOD -- THATS JUST SO I KNOW IN MY PLAYING WHETHER THE RIGHT OBJECT IS BEING SERIALIZED

When a Pipe has completed its processing of an object, it updates the path of the object using this handy-method which converts Vertex objects to their MicroVertex representation (respectively for Edges and Properties).

// cool side note -- Path is now a class in Gremlin and it has a cool BiConsumer forEach() all Java8-stylie.

Now, when you do:

Gremlin.of().as("a").outE("created").inV().property("name").back("a").path()

The result in Gremlin OLAP is as expected:

==>[v[1]., e[9][1-created->3]., v[3]., p[name->lop]., v[1].]
==>[v[4]., e[11][4-created->3]., v[3]., p[name->lop]., v[4].]
==>[v[4]., e[10][4-created->5]., v[5]., p[name->ripple]., v[4].]
==>[v[6]., e[12][6-created->3]., v[3]., p[name->lop]., v[6].]

TADA! I thought this was going to be such a big pain in the butt, but it wasn't all that bad. Stoked! Next up on the list of things to do:

1. Loop-step and the ability to propagate Holder<T>s (initial thoughts make me believe this will be easy).
2. Barrier-steps like count(), groupCount(), groupBy(), etc. (I have no idea how this is going to work :|).

Finally, if we think that MicroXXXs would be generally useful classes for people, we could move the implementations out of gremlin-core to blueprints-core.

Thank you for reading,
Marko.

Marko Rodriguez

unread,
Jan 8, 2014, 2:00:58 PM1/8/14
to gremli...@googlegroups.com
Dear constituents of Da 'Pop,

I am proud to announce that Gremlin OLAP now supports loop()-step and arbitrary object mutations (generating objects beyond the graph boundary). In this way, Gremlin OLAP is now more expressive than Faunus' Gremlin implementation (http://faunus.thinkaurelius.com). Moreover, what is fascinating to me is that it requires no logic to analyze the pipeline and individual pipes -- the code (once written) is dead simple.


Prior to this moment (and how Gremlin/Faunus works) the number of iterations to execute GremlinVertexProgram was the number of pipes in the pipeline. However, with loop(), the number of iterations to complete the traversal may not equal (and usually doesn't) the number of pipes in the pipeline. As such, borrowing from the Pregel model, I implemented a VOTE_TO_HALT mechanism, where if a Vertex doesn't send a message during its iteration, it VOTES_TO_HALT. This is implemented using newly added GraphMemory.and()/GraphMemory.or().
 

Now, when VertexProgram.terminate() is called, it is simply a matter of && || on this associative, commutative computation of this boolean value:

Next, when within "the realm of the graph" (that is, when traversing over vertices, edges, and properties), traversal occurs via message passing. Per the discussion with Viktoras, I pushed the messaging logic into the GremlinMessage and now the GremlinMessage is like an "agent" with its own execute method:


So, when you do:


You get back:

==>[v[1]., v[4]., v[3]., lop]
==>[v[1]., v[4]., v[5]., ripple]

Remember, the '.' after the toString() is because we are propagating MicroXXX-versions of the elements and for my sanity (right now), I append the '.' so I know things are working correctly.

The reason why this Gremlin is easier to code is because the Pipes are now state-free … all of the logic around path calculations, object/pipe-binding, etc. is in this crazy little object called Holder<T>:
The above is "the trick" --- at the expense of object creation! ……………. each object in a Pipe is "bubbled" in this Holder<T> object that maintain metadata about the object (the path it took to get to where it is, the number of iterations it has gone through in a looping construct (will need to make it more complex for nested loops!). This makes Pipes/Gremlin so much simpler...

Enjoy!,
Marko.

Reply all
Reply to author
Forward
0 new messages