Hello,
So TINKERPOP-1278 (aka “Gremlin-Python”) has introduced the notion of Traversal ByteCode.
In essence, ByteCode is the construction history of a traversal and is of the form:
[string, object*]* // a list of (operator, arguments[])
The traversal
g.V(1).outE(‘created’).inV().
repeat(out(‘created’).in(‘created’)).times(5).
valueMap(’name’,’age’)
has a ByteCode representation as below:
[
[V, 1]
[outE, ‘created’]
[inV]
[repeat, [
[out, ‘created’]
[in, ‘created’]
]
[times, 5]
[valueMap, name, age]
]
Again, Gremlin is a simple language based on function concatenation and nesting. Thats all there is to it. Thus, it forms a tree and trees are easy to encode, distribute, serialize, decode, prune/optimize, search, etc. Moreover, every programming language supports function composition and nesting and thus, Gremlin is able to be hosted in any programming language. [
http://tinkerpop.apache.org/gremlin.html]
The benefit of ByteCode as it applies to TINKERPOP-1278 is that a Translator is able to access the ByteCode of the traversal and then use that linear-nested structure (wide-tree) to generate a traversal representation in another language — e.g. Gremlin-Python, Gremlin-Ruby, Gremlin-JavaScript, etc.
Here is the Gremlin-Python translator that will turn ByteCode into Gremlin-Python:
Here is the Gremlin-Groovy translator that will turn ByteCode into Gremlin-Groovy:
Pretty simple, eh? So, why would you want Gremlin-Java to translate to Gremlin-Groovy? Well, so you can code in Gremlin-Java and then have it execute on GremlinServer via the GremlinGroovy JSR223 ScriptEngine. However, one can imagine a Gremlin-Java->Gremlin-Java translator! What is that?! Well, it would use reflection (or some more efficient mechanism) to reconstruct the Gremlin-Java traversal from ByteCode generated from Gremlin-Java and thus, the entire cluster/sever infrastructure is simply migrating ByteCode around as opposed to worrying about language specific representation — e.g. it has nothing to do with the JVM! Also, assume a Python-based graph database exists that implements the GreminVM — Gremlin-Java can easily talk to it via ByteCode.
To ensure ByteCode generation is not costly, here are the runtimes for construction and compilation of a “fairly complex” traversal in both master/ and TINKERPOP-1278/
master/
gremlin> clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity() }
gremlin> clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity().applyStrategies() }
TINKERPOP-1278/
gremlin> clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity() }
gremlin> clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity().applyStrategies() }
Finally, there are various entailments that come from this ByteCode representation that have started to surface in my mind:
1. We can now optimize at the ByteCode level and not at the step level — ByteCodeStrategies could be a new TraversalStrategy class that comes before DecorationStrategies.
2. Gremlin-Java can support bindings so that its Gremlin-Groovy (e.g.) compiled form uses bindings. How? By simply rewriting the ByteCode prior to compilation and replacing values with variables!
3. GraphSON can now easily support Traversal serialization — ByteCode in JSON is natural. Gremlin-GraphSON anyone? (get it?)
[g, V : [1], outE : [created], inV : repeat : [out : [created], in : [created]], times : 5], valueMap : [name, age]]
This is what makes Gremlin so powerful — its syntax is crazy simple as its just functions and thus, it can naturally exist in any language — even XML! … ByteCode is what is going to free Gremlin from the JVM…as we are already now on the CPython VM with relative ease:
PythonGraphTraversal
JythonTranslator (Gremlin-Python to Gremlin-Jython)
GroovyTranslator (Gremlin-Python to Gremlin-Groovy)
*** NOTE: I have yet to introduce the ByteCode concepts to gremlin-python/ package so use your imagination when seeing how PythonGraphTraversal will construct ByteCode and the respective translators will translate it. Finally, realize that people can now compile Python ByteCode into Python-based steps and thus, Gremlin can live and execute on the Python VM against Python-based graph systems. Its really that easy (though, lots of work to implement the 30 some standard steps in Python). What this means though is that, in the future, Gremlin can just move between languages/VMs … between Python, JVM, Ruby, JavaScript, C, etc.-based graph processing systems. One language, tailored to its host, and agnostic to the underlying virtual machine.
Enjoy,
Marko.