A random list of thoughts on TinkerPop4

Marko Rodriguez

unread,

Aug 29, 2017, 10:32:57 AM8/29/17

to gremli...@googlegroups.com

Hi,

Lately, I’ve been thinking more and more about TinkerPop4. Here are my thoughts in no particular order:

* Everything is about language agnosticity. Java is just a Gremlin language variant.
* The Gremlin traversal machine is packaged more concisely w/ only a few “input”/“outputs”.
* GremlinServer is simply the network I/O to the GremlinVM.
* The step interfaces should be OneToOne, OneToMany, ManyToOne, and ManyToMany.
* Steps have a tunable “cost” associated with them that each vendor can specify.
* Gremlin compiler uses a theorem prover to find the cheapest equivalent step compilation.
* The traverser species to use for a traversal is determined by a FinalizationStrategy. Can change depending on execution engine too!
* gremlin-core/ goes away in favor of gremlin-java/.
* Everything is a Gremlin Language Variant, just some variants don’t have VM implementations.
* Provide two Gremlin VMs — Java and ?Python? — to demonstrate that VM implementations are like language implementations.
* There is no more Gryo, only GraphSON and we make a more concise representation. (and using immutable parsers)
* We make the traversal execution engine more pluggable — via an ExecutionStrategy interface.
* We provide spark-gremlin/ for GraphComputer-based execution.
* We provide akka-gremlin/ for GraphActor-based execution.
* We make STANDARD (OLTP) something like GraphStreams so its easy to discuss and works in a taxonomy better.
* We drop giraph-gremlin/ to reduce the overhead of what we have to maintain at Apache.
* We make TinkerGraph a distributed in-memory graph system using Apache Ignite.
* TinkerGraph does not have a TinkerGraphComputer, it simply enables integration with spark-gremlin/ and akka-gremlin/.
* We introduce the concept of graph partitions to have better control/understanding of the physical location of vertices/edges.
* GremlinConsole supports switching between Gremlin Language Variants. > :language-variant gremlin-python
* There is no direct access to Graph, everything is withRemote() with “local interaction” being a simple LocalConnection object.
* There is no direct access with vendor-specific Vertex/Edge/etc. implementations. Everything operates on ReferenceXXX.
* The Gremlin bytecode specification is published and static.
* The Gremlin step library is re-org’d away from map/flatMap/filter.. to one-to-one/, one-to-many/, etc.
* We provide Gremlin-SPARQL and ?Gremlin-SQL? as distinct languages that compile to the GremlinVM.
* We drop hadoop-gremlin/ and roll the HDFS helper stuff into spark-gremlin/ via a Hadoop subpackage.
* We begin TinkerPop4 with a solid benchmarking package that allows us to see performance differences with each ‘git push.’
* We maintain two toy datasets — one that is small (< 10 vertices/edges) and one that is “big" (> 1000 vertices).
* The toy datasets must include loops, multi-properties, complex properties, etc.
* The concept of Traverser bulk must be pluggable so that distinction between sack() and bulk() is made less confusing.
* The Gremlin language step library must be thought out in full detail. Removing steps that are rarely used and making steps for motifs that are common.
* Traversal side-effects are not accessible via Traversal.getSideEffects() but only via the traversal execution (e.g. cap()).
* Gremlin DSLs should always compile to Gremlin bytecode as there will be no other bytecode specification. Right now, its possible to extend bytecode …bad.
* We kill Gremlin-Sugar plugin.
* We support serialized lambdas.
* We re-think the transaction model and its representation in the language.
* We make sure the test suite is language agnostic — no just ScriptEngine agnostic, but also, e.g. C# .NET agnostic.
* We never test the Graph API directly, all testing is via traversals.
* The only thing that can be submitted to the Gremlin VM is bytecode and only ints, longs, strings, vertex, edge, etc. is ever returned.
* We constrains the types of Gremlin — long, int, double, Map, list, Vertex, Edge, Property, String. All GLVs must support the standard core.

Anywho, thought I would just blurt out a bunch of thoughts while I’m waiting for code to compile…

Take care,
Marko.

http://markorodriguez.com

Ranger Tsao

unread,

Sep 2, 2017, 11:40:14 AM9/2/17

to Gremlin-users

So many useful and unimaginable features

在 2017年8月29日星期二 UTC+8下午10:32:57，Marko A. Rodriguez写道：

song

unread,

Sep 2, 2017, 1:03:34 PM9/2/17

to Gremlin-users

Wow... you have a lot going on in your head...

1. We have an implementation over ignite. We can probably spin it off and merge it into TinkerPop. Ignite people would be excited about this too.

2. Is there a roadmap for GraphActor? Have you thought about GraphVerticle? http://vertx.io

3. Yup, drop giraph-gremlin. Lose some weight...

4. Anyone working on a new Gremlin VM? A python implementation would mean better integration with data science community.

5. Probably make a graph generator for fun and benchmarking.

6. A new transaction model is a must for robust oltp/olap convergence.

Vladyslav Kosulin

unread,

Sep 7, 2017, 10:40:23 AM9/7/17

to Gremlin-users

Why drop Gryo? It is much more efficient with big graphs. Might be useless with distributed backends, but is the best option IMHO for backup/restore, etc. for berkeley graphs.

And what about GraphML? It might be far from perfect, but is the only 'portable' standard.

I completely agree on transaction model overhaul.

Reply all

Reply to author

Forward