Hi,
Lately, I’ve been thinking more and more about TinkerPop4. Here are my thoughts in no particular order:
* Everything is about language agnosticity. Java is just a Gremlin language variant.
* The Gremlin traversal machine is packaged more concisely w/ only a few “input”/“outputs”.
* GremlinServer is simply the network I/O to the GremlinVM.
* The step interfaces should be OneToOne, OneToMany, ManyToOne, and ManyToMany.
* Steps have a tunable “cost” associated with them that each vendor can specify.
* Gremlin compiler uses a theorem prover to find the cheapest equivalent step compilation.
* The traverser species to use for a traversal is determined by a FinalizationStrategy. Can change depending on execution engine too!
* gremlin-core/ goes away in favor of gremlin-java/.
* Everything is a Gremlin Language Variant, just some variants don’t have VM implementations.
* Provide two Gremlin VMs — Java and ?Python? — to demonstrate that VM implementations are like language implementations.
* There is no more Gryo, only GraphSON and we make a more concise representation. (and using immutable parsers)
* We make the traversal execution engine more pluggable — via an ExecutionStrategy interface.
* We provide spark-gremlin/ for GraphComputer-based execution.
* We provide akka-gremlin/ for GraphActor-based execution.
* We make STANDARD (OLTP) something like GraphStreams so its easy to discuss and works in a taxonomy better.
* We drop giraph-gremlin/ to reduce the overhead of what we have to maintain at Apache.
* We make TinkerGraph a distributed in-memory graph system using Apache Ignite.
* TinkerGraph does not have a TinkerGraphComputer, it simply enables integration with spark-gremlin/ and akka-gremlin/.
* We introduce the concept of graph partitions to have better control/understanding of the physical location of vertices/edges.
* GremlinConsole supports switching between Gremlin Language Variants. > :language-variant gremlin-python
* There is no direct access to Graph, everything is withRemote() with “local interaction” being a simple LocalConnection object.
* There is no direct access with vendor-specific Vertex/Edge/etc. implementations. Everything operates on ReferenceXXX.
* The Gremlin bytecode specification is published and static.
* The Gremlin step library is re-org’d away from map/flatMap/filter.. to one-to-one/, one-to-many/, etc.
* We provide Gremlin-SPARQL and ?Gremlin-SQL? as distinct languages that compile to the GremlinVM.
* We drop hadoop-gremlin/ and roll the HDFS helper stuff into spark-gremlin/ via a Hadoop subpackage.
* We begin TinkerPop4 with a solid benchmarking package that allows us to see performance differences with each ‘git push.’
* We maintain two toy datasets — one that is small (< 10 vertices/edges) and one that is “big" (> 1000 vertices).
* The toy datasets must include loops, multi-properties, complex properties, etc.
* The concept of Traverser bulk must be pluggable so that distinction between sack() and bulk() is made less confusing.
* The Gremlin language step library must be thought out in full detail. Removing steps that are rarely used and making steps for motifs that are common.
* Traversal side-effects are not accessible via Traversal.getSideEffects() but only via the traversal execution (e.g. cap()).
* Gremlin DSLs should always compile to Gremlin bytecode as there will be no other bytecode specification. Right now, its possible to extend bytecode …bad.
* We kill Gremlin-Sugar plugin.
* We support serialized lambdas.
* We re-think the transaction model and its representation in the language.
* We make sure the test suite is language agnostic — no just ScriptEngine agnostic, but also, e.g. C# .NET agnostic.
* We never test the Graph API directly, all testing is via traversals.
* The only thing that can be submitted to the Gremlin VM is bytecode and only ints, longs, strings, vertex, edge, etc. is ever returned.
* We constrains the types of Gremlin — long, int, double, Map, list, Vertex, Edge, Property, String. All GLVs must support the standard core.
Anywho, thought I would just blurt out a bunch of thoughts while I’m waiting for code to compile…
Take care,
Marko.
http://markorodriguez.com