Does system X require Clojure? No, JVM.

Marko Rodriguez

unread,

Oct 30, 2015, 6:27:04 PM10/30/15

to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org

Hello,

While these ideas are not new to people on TinkerPop3, I had a nice revelation that I expressed in the following tweet series.

https://twitter.com/twarko/status/660215611117535232

Why is there no "standard programming language?" Different programming languages are good at different things.

What makes more languages emerge and grow? A virtual machine abstraction.

The JVM is the breeding ground for programming languages.

Java projects can have many programming languages in them. No worries.

There should be no "standard graph language?" Different graph languages are good at different things.

What makes more graph languages emerge and grow? A traversal machine abstraction.

The Gremlin traversal machine can be the breeding ground for traversal languages.

TinkerPop projects can have many graph languages in them. No worries.

Take care,

Marko.

http://markorodriguez.com

pieter

unread,

Oct 31, 2015, 2:37:44 AM10/31/15

to gremli...@googlegroups.com

Yeah, Cypher/Sparql/OrientQL whatever does not compete with Gremlin.
Gremlin enables all of them.

Cheers
Pieter

> --
> You received this message because you are subscribed to the Google
> Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to gremlin-user...@googlegroups.com
> <mailto:gremlin-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com
> <https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Marko Rodriguez

unread,

Oct 31, 2015, 12:06:44 PM10/31/15

to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org, Alex Popescu

Hi,

Yesterday I was HipChatting with Alex Popescu (cc:d) about "there is no need for a standard query language" as there is no need for a "standard programming language." He said something to the effect of "that is a strong argument, however there will then be discussions of virtual machine execution vs. native execution."

Last night I was thinking -- "hmmm, that will be a bad argument to make." Why?

Gremlin shouldn't be touted as a "virtual machine" but as a "traversal machine" (an execution engine). When Gremlin talks to an underlying graph system its talking to TinkerPop ("Blueprints") and then to the native API of the graph system. For systems that have TinkerPop as their native API (Titan/Bitsy/etc.) Gremlin is not a "virtual machine." For systems that don't (OrientDB/Neo4j/etc.), the cost for the indirection from going from TinkerPop API to the graph systems native API is trivial as its typically just object wrapping on the short-lived object heap (we will amortize this cost later -- watch). Next, all graph systems maintain an "execution engine" for their respective query language. That is, OrientSQL, Cypher, SPARQL ultimately talk to their graph system's API: OrientDB Java API, Neo4j Java API, and Sesame or Jena, respectively. Gremlin does the same thing, it just talks to TinkerPop ("Blueprints") first, which then talks to those APIs. What makes Gremlin neat is that the execution engine and the language are not strongly coupled as its very easy for any graph language to compile to the Gremlin machine. So there is no relative cost in the language->machine translation, the cost (though minor -- wait for it) is in the machine->API translation. However, given the conceptual simplicity (and engineering) of the Gremlin machine, those costs are quickly subsumed. With MatchStep's runtime optimizer, traverser bulking, LazyBarriers, and (most importantly) provider specific compiler strategies (see Titan's beautiful use of these), Gremlin can be faster than the provider's "native query" language. In fact, some internal benchmarking I've done has shown that Gremlin is indeed equal or faster than the native language of the graph system where sometimes those speed differences are 5x to the the life of the universe. Thus, the cost of TinkerPopAPI->NativeAPI is so trivial at that point, its not worth even considering discussing the "cost of virtualization." I suspect that (though this is complete speculation at this point) that X-Language->GremlinMachine->Y-System could be faster than X-Language->Y-System given Gremlin's current (and future) compiler/engine design and evolution.

Thus, Gremlin shouldn't be seen as a "virtual machine," but as a "traversal machine" that any one can connect to their graph system. It supports any graph language that compiles to it. It is an efficient/simple OLTP/OLAP execution engine pre-written for you.

Thanks,

Marko.

http://markorodriguez.com

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/563461AC.1070605%40gmail.com.

Matthias Broecheler

unread,

Nov 2, 2015, 1:12:23 PM11/2/15

to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org, Alex Popescu

I think this is a compelling argument, however, it has one major flaw: Gremlin is currently not aware of the schema or any statistics of the underlying graph database.

For "simple" optimizations that's not too bad - the underlying graph database can simply replace the respective step in the traversal with an optimized step. That's what Titan does for TitanGraphStep or TitanVertexStep. Those are also interesting as there is quite a bit of logic you need to put in there to understand what you can reorder and pull into a step.

However, it gets pretty complicated when you look at a thing like MatchStep which is very crucial for most of the arguments that make Gremlin a general "traversal machine". Both SPARQL->Gremlin and SQL->Gremlin rely heavily on MatchStep.

Now, looking more closely at MatchStep there seems to be no way for Titan to instill its knowledge of the schema or statistic or indexes or anything into the algorithm that executes MatchStep.

So, for Titan to get a "good" implementation of MatchStep Titan will need to effectively reimplement it. And, arguably, that's a big part of a query language (i.e. the entire declarative piece of Gremlin).

So, the question then becomes: Does the argument of Gremlin being a universal traversal machine only hold for the imperative parts or can it be extended to the declarative aspects as well?

To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/F7FA4CE3-CDCD-40CE-8A10-8D0DF18FD89B%40gmail.com.

Marko Rodriguez

unread,

Nov 2, 2015, 2:01:19 PM11/2/15

to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org, Alex Popescu

Hello,

There are two ways of going about "more information about the database."

1. The provider has access to the Traversal and can rewrite as they need.

- e.g. XXXGraphStepStrategy implementations selecting the appropriate indices.

2. The provider provides more information to TinkerPop to allow TinkerPop to do the work.

- e.g. MatchStep (sorta) where we infer the graph statistics from runtime performance.

There has been various discussions on this list (primarily stated by Pieter Martin) about getting schema information to TinkerPop. However, like indices, do we want to make that explicit given every providers differences in how such matters are handled. Thus, its the tradeoff between does the provider do the heavy lifting (1) or does TinkerPop (2). I think there will always be a balance in that providers will always have to do their own XXXGraphStep implementations where they can determine the selectivity of various indices internally. For (2), one of the big pushes for 3.2.0 is the development of "RuntimeTraversalStrategy" which will generalize the "MatchAlgorithm" package (and thus, kill it) to support runtime traversal ordering for other area of Gremlin such as OR, AND, linear reversal, etc.

Marko.

http://markorodriguez.com

To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAEsQWxrSPnubOYWX%3DqJ2AujYvwu8TsekmbZC_bNurooKRGGG7Q%40mail.gmail.com.

Reply all

Reply to author

Forward