Hi,
Yesterday I was HipChatting with Alex Popescu (cc:d) about "there is no need for a standard query language" as there is no need for a "standard programming language." He said something to the effect of "that is a strong argument, however there will then be discussions of virtual machine execution vs. native execution."
Last night I was thinking -- "hmmm, that will be a bad argument to make." Why?
Gremlin shouldn't be touted as a "virtual machine" but as a "traversal machine" (an execution engine). When Gremlin talks to an underlying graph system its talking to TinkerPop ("Blueprints") and then to the native API of the graph system. For systems that have TinkerPop as their native API (Titan/Bitsy/etc.) Gremlin is not a "virtual machine." For systems that don't (OrientDB/Neo4j/etc.), the cost for the indirection from going from TinkerPop API to the graph systems native API is trivial as its typically just object wrapping on the short-lived object heap (we will amortize this cost later -- watch). Next, all graph systems maintain an "execution engine" for their respective query language. That is, OrientSQL, Cypher, SPARQL ultimately talk to their graph system's API: OrientDB Java API, Neo4j Java API, and Sesame or Jena, respectively. Gremlin does the same thing, it just talks to TinkerPop ("Blueprints") first, which then talks to those APIs. What makes Gremlin neat is that the execution engine and the language are not strongly coupled as its very easy for any graph language to compile to the Gremlin machine. So there is no relative cost in the language->machine translation, the cost (though minor -- wait for it) is in the machine->API translation. However, given the conceptual simplicity (and engineering) of the Gremlin machine, those costs are quickly subsumed. With MatchStep's runtime optimizer, traverser bulking, LazyBarriers, and (most importantly) provider specific compiler strategies (see Titan's beautiful use of these), Gremlin can be faster than the provider's "native query" language. In fact, some internal benchmarking I've done has shown that Gremlin is indeed equal or faster than the native language of the graph system where sometimes those speed differences are 5x to the the life of the universe. Thus, the cost of TinkerPopAPI->NativeAPI is so trivial at that point, its not worth even considering discussing the "cost of virtualization." I suspect that (though this is complete speculation at this point) that X-Language->GremlinMachine->Y-System could be faster than X-Language->Y-System given Gremlin's current (and future) compiler/engine design and evolution.
Thus, Gremlin shouldn't be seen as a "virtual machine," but as a "traversal machine" that any one can connect to their graph system. It supports any graph language that compiles to it. It is an efficient/simple OLTP/OLAP execution engine pre-written for you.
Thanks,
Marko.