On the concept of BytecodeStrategies

181 views
Skip to first unread message

Marko Rodriguez

unread,
Oct 13, 2016, 8:37:23 AM10/13/16
to d...@tinkerpop.apache.org, gremli...@googlegroups.com
Hello,

There are two types of “programs” in Gremlin: Bytecode and Traversals.

Bytecode => Virtual machine instructions (like Java bytecode)
Traversals => Machine instructions (like Intel machine code)

The core of Gremlin’s compiler is its TraversalStrategies. A traversal strategy works on a traversal-by-traversal level walking the traversal tree rewriting sections of the traversal into (typically) more optimal forms.

void TraversalStrategy.apply(Traversal<S,E> traversal)

Working at the Traversal object level is important because the Gremlin language steps (has(), out(), in(), etc.) don’t always map one-to-one with the machine instructions (HasStep, VertexStep, VertexStep). Its better to work at the machine-level because there are more nick-nack mutations one can do at that level. However, as you can see, traversal strategies are “machine dependent.” That is, they are tied to the Gremlin traversal machine implementation.

While there is currently only one Gremlin virtual machine (Gremlin-Java machine), there are many Gremlin language variants — Gremlin-Java, -Groovy, -Python, SQL-Gremlin, SPARQL-Gremlin, etc. When these languages communicate with a/the Gremlin traversal machine, they communicate via Gremlin bytecode. Now, it is possible to optimize bytecode. In principle, we can do “client side” optimizations on the bytecode prior to sending it to the Gremlin traversal machine for execution. Why would we want do this?

1. We can reduce the amount of work (clock cycles) required of “the server” which would ultimately do the TraversalStrategy optimization.
2. We can have optimizations that are machine independent and thus, can be useful against any Gremlin traversal machine implementation.
3. While the server is “streaming in” the Bytecode, it can also optimize the bytecode prior to applying TraversalStrategy optimizations.

[Gremlin-Java Traversal Machine] <== network connection ==> [Gremlin-XXX Language Variant]
  * pre-process bytecode                                      * pre-process bytecode 
    before translating to traversal                             before sending over network      
  * apply traversal strategies
  * execute traversal

What would Bytecode strategies look like? Here is an idea:

void TraversalStrategy.apply(Bytecode bytecode)

Lets look at a simple strategy. IdentityRemoveStrategy will turn traversals of the form g.V().identity().as(“a”).identity() into g.V().as(“a”). Here is this strategy written in both Java and Python:


Given that there (currently) is no Gremlin-Python traversal machine implementation, __apply_traversal(traversal) does nothing. However, given that there is a Gremlin-Python language variant, __apply_bytecode(traversal) does something. Moreover, note that we already have IdentityRemovalStrategy in Gremlin-Python, but, as you can see, it does nothing as (currently) strategies only operate on traversals.


AS A SIDE: The reason strategies exists in Gremlin-Python is so that users can do stuff like:

Anywho, so there you have it. I’ve made a ticket:

You thoughts on the idea are more than appreciated.

Take care,
Marko.

Tim Tan

unread,
Dec 4, 2017, 9:39:57 AM12/4/17
to Gremlin-users
Is this proposal/ something similar going to be under consideration.

In my case, I was hoping to make use of custom traversal strategies that are bytecode-only modifications (e.g. add a system property such as userId into addVertex() steps) in a java application and send them through a remote connection which only accepts bytecode. I'd like to leverage the (decoration) traversal strategy as it seems to be the identified way to inject application specific code per the reference documentation. 

Marko Rodriguez

unread,
Dec 4, 2017, 1:03:05 PM12/4/17
to gremli...@googlegroups.com
Hello,

Is this proposal/ something similar going to be under consideration.

This was proposed, but has not been developed.

In my case, I was hoping to make use of custom traversal strategies that are bytecode-only modifications (e.g. add a system property such as userId into addVertex() steps) in a java application and send them through a remote connection which only accepts bytecode. I'd like to leverage the (decoration) traversal strategy as it seems to be the identified way to inject application specific code per the reference documentation.

I don’t see why you can’t do this via a DecorationStrategy. I don’t know why you think it has to be a BytecodeStrategy. Have you tried making a custom strategy as you proposed?

Marko.


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/9725db79-8a06-4c22-ae63-9907704a4515%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Tan

unread,
Dec 5, 2017, 11:22:20 AM12/5/17
to Gremlin-users
The Decoration strategy was straightforward to implement, but I ran into issues with serialization/deserialization of the strategy and just sending it through to the backing graph db, at least with the dse java driver. Perhaps it's more of a question for dse.

Kevin Gallardo

unread,
Dec 5, 2017, 12:16:55 PM12/5/17
to Gremlin-users
I think the issue currently with TraversalStrategies in the context of a remote-based traversal execution is that if a custom strategy is used, it has to be known on the server and the client. Otherwise currently with a TraversalStrategy the traversal would be decorated but it is also sent as part of the Bytecode, with the strategy's class that would then need to be deserialized on the server to be applied server-side.

Currently with the tp driver (same situation for the dse driver) I do the following:
        Cluster tpCluster = Cluster
           
.build("127.0.1.1")
           
.serializer("GRAPHSON_V3D0")
           
.create();
       
GraphTraversalSource g = EmptyGraph.instance().traversal()
           
.withStrategies(new MyCustomStrategy())
           
.withRemote(DriverRemoteConnection.using(tpCluster, "demo.g"))
           
;

        g
.addV("label").next();

And in this situation first the serializer is not able to serialize the custom strategy which is being included in the Bytecode automatically, since the traversal strategy class is custom and for GraphSON for example, the serializer cannot find the ID for this class. But assuming this was somehow achieved, the server receiving the request will try to deserialize the strategy part of the Bytecode, except that this strategy class is not known server side. However if instead I use a known strategy like SubgraphStrategy there are no problem (for both drivers). Additionally, the server tries to deserialize the traversal strategy automatically even though it's not required, since the only thing we wanted to do was alter the Traversal client side prior to sending it.

A Bytecode strategy would allow for users to implement custom decoration strategies while transparent to the server.

Marko Rodriguez

unread,
Dec 5, 2017, 12:42:00 PM12/5/17
to gremli...@googlegroups.com
Hello,

The problem with BytecodeStrategies, in this context, is that they would be language specific. You would have BytecodeStrategy implemented in Python (for instance) and it would mutate your bytecode before being sent to the server. If the same organization wanted to connect via Java, they would have to write a Java version of the bytecode strategy.

The second problem with BytecodeStrategies is that they would only work for rewriting bytecode using the standard Gremlin bytecode opcodes. What is nice about current TraversalStrategies is that they work at the step-level allowing providers to insert arbitrary/custom steps (e.g. MyGraphStep). This is a must. However, for TinkerPop4, I believe that it should ALL be BytecodeStrategies and that providers should be able to register opcodes such that when Gremlin goes to compile bytecode it says: “Huh — myV(1), what is that? Let me look at the registered opcodes of the underlying graph.” I think manipulating bytecode will be much less error-prone than manipulating traversal steps.

To conclude, you can register strategies with GraphSON. All strategies just take a key/value pair Map to construct. The problem then is, as Kevin points out, the server needs to know about that class via a .jar file. This is the purpose of Gremlin Server plugins, but it sounds like DSEGraph doesn’t support that. ?

Marko.

Kevin Gallardo

unread,
Dec 6, 2017, 12:06:44 PM12/6/17
to Gremlin-users
 What is nice about current TraversalStrategies is that they work at the step-level allowing providers to insert arbitrary/custom steps (e.g. MyGraphStep).

Although currently TraversalStrategies don't allow to add custom opcodes either, and just re-use the standard Gremlin opcodes with different/additional steps implementations.

My take on this is that you would still have to adapt each GLV for a new strategy even if it's only to write a placeholder/proxy strategy. Additionally at first sight having to write something about GraphSON when adding a new traversal strategy seems unrelated when you're not familiar with the whole tinkerpop stack.
 
 the server needs to know about that class via a .jar file

Indeed, DSE Graph doesn't support that as we would not recommend users or client applications loading custom JARs on a production server. Because of general issues regarding deployment, maintenance and security.
Reply all
Reply to author
Forward
0 new messages