Importance of bindings in queries

537 views
Skip to first unread message

dap...@gmail.com

unread,
Jan 5, 2018, 10:21:36 AM1/5/18
to Gremlin-users
I'm seeing that AWS Neptune will not support bindings, but can't get my head around the implications of this sort of constraint. Are there any performance benefits to sending queries with bound variables? Is a lack of support for bindings a deal breaker? Or is it just a convenience feature? Any insights into this would be greatly appreciated!

Kelvin Lawrence

unread,
Jan 5, 2018, 10:38:09 AM1/5/18
to Gremlin-users
Bindings can offer a significant performance boost as the Gremlin server (say) can compile and cache a query knowing that if it comes in again it can just reuse it and apply the bound parameters. I am not familiar with Neptune enough to know if it offers alternative ways to achieve the same thing but in projects I have worked on we have tended to recommend people use bindings for the reason I mentioned above.

Robert Dale

unread,
Jan 5, 2018, 10:50:42 AM1/5/18
to gremli...@googlegroups.com
Depends on what sort of 'bindings' we're talking about. Groovy script or Bytecode?  Only groovy script is compiled and cached. The use of script parameters will offer potential performance improvement. Bytecode bindings are futile.

Robert Dale

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/80e41029-5402-4d8f-8224-801f0fcb6704%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Don Omondi

unread,
Jan 5, 2018, 11:10:48 PM1/5/18
to Gremlin-users
Interesting @Robert, I wasn't even aware that there were two types of bindings. 

When I was started out with JanusGraph and trying to bulk load a few million entries, I just sent the addV() and addE() commands in a loop but got an OutOfMemory exception at about 300,000. I thought it could be because of the script cache getting full so I change to use bindings and while somehow the script got slower it didn't ever run out of memory until the 8M odd elements were loaded.

Could you tell me what kind of bytecode is used below, because this is what I went through.

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]

// First method that threw OutOfMemory Exception

gremlin> g=graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('person').property('name','Don').getBytecode()
==>[[], [addV(person), property(name, Don)]]
gremlin> g.addV('person').property('name','Don').getBytecode().getBindings()
gremlin>

// Second slower method but never ran out of memory

gremlin> b = new Bindings()
==>bindings[main]
gremlin> g.addV('person').property('name', b.of('name', 'Don')).getBytecode()
==>[[], [addV(person), property(name, binding[name=Don])]]
gremlin> g.addV('person').property('name', b.of('name', 'Don')).getBytecode().getBindings()
==>name=Don
gremlin>

Regards,

On Friday, January 5, 2018 at 6:50:42 PM UTC+3, Robert Dale wrote:
Depends on what sort of 'bindings' we're talking about. Groovy script or Bytecode?  Only groovy script is compiled and cached. The use of script parameters will offer potential performance improvement. Bytecode bindings are futile.

Robert Dale

On Fri, Jan 5, 2018 at 10:38 AM, Kelvin Lawrence <kelvin.r...@gmail.com> wrote:
Bindings can offer a significant performance boost as the Gremlin server (say) can compile and cache a query knowing that if it comes in again it can just reuse it and apply the bound parameters. I am not familiar with Neptune enough to know if it offers alternative ways to achieve the same thing but in projects I have worked on we have tended to recommend people use bindings for the reason I mentioned above.

On Friday, January 5, 2018 at 9:21:36 AM UTC-6, dap...@gmail.com wrote:
I'm seeing that AWS Neptune will not support bindings, but can't get my head around the implications of this sort of constraint. Are there any performance benefits to sending queries with bound variables? Is a lack of support for bindings a deal breaker? Or is it just a convenience feature? Any insights into this would be greatly appreciated!

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Stephen Mallette

unread,
Jan 6, 2018, 6:51:58 AM1/6/18
to Gremlin-users
So, just to be clear, when you started with JanusGraph, you weren't sending scripts to it. You were creating "g", your GraphTraversalSource, like this:

gremlin> graph = EmptyGraph.instance()
==>emptygraph[empty]
gremlin> g = graph.traversal().withRemote('conf/remote-graph.properties')
==>graphtraversalsource[emptygraph[empty], standard]

In other words you were connecting via the withRemote() option to send "remote traversals" to the server:


Is that right? If so, you say that using your first method above ended up running out of memory? I have not known remote traversal to hit out of memory problems on the server before. that is really strange and if this is what you were experiencing i'd like to know more about what you were doing.




To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/36d19614-d2e3-4e4a-9334-50263b3a1ed7%40googlegroups.com.

Don Omondi

unread,
Jan 6, 2018, 4:25:37 PM1/6/18
to Gremlin-users
Thanks for the added insights @Stephen, to answer your questions. I'd configure gremlin-server
by copying the contents of stock conf/gremlin-server/gremlin-server.yaml to conf/gremlin-server/socket-gremlin-server.yaml and add authentication and ssl.

authentication: {
  className: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator,
  config: {
    credentialsDb: conf/tinkergraph-credentials.properties}}
ssl: {
  enabled: true}


Then start it using 
sudo /bin/gremlin-server.sh ./conf/gremlin-server/gremlin-server.yaml

(I read a comment where you said it's not a must to use sudo but since I did first time round I'm stuck with it)

I start it in a similar fashion on Windows where I develop (and can reproduce OutOfMemoryException) with with gremlin-server.bat

socket-gremlin-server.yaml uses janusgraph-cassandra-es-server.properties which has the setting gremlin.graph=org.janusgraph.core.JanusGraphFactory so I'm not sure if it uses withRemote()

I use the PHP library (https://github.com/PommeVerte/gremlin-php) to connect via sockets to send queries, viewing logs in log/gremlin-server.log show the following snippet

[RequestMessage{, requestId=26c0312e-1e27-43bd-a3b4-7b38bc224b7e, op='eval', processor='', args={gremlin=g.V().hasLabel('persons').has('user_id', 1) }}].

Or something like this when I use the aforementioned bingings

[RequestMessage{, requestId=26c0312e-1e27-43bd-a3b4-7b38bc224b7e, op='eval', processor='', args={gremlin=b= new Binding(); g.V().hasLabel('persons').has('user_id', b.of('user_id', 1)) }}].

Stephen Mallette

unread,
Jan 6, 2018, 4:50:14 PM1/6/18
to Gremlin-users
You are mixing a lot of things together. If you were using the PHP library then you were submitting scripts. A "script" is just a string of text being sent to the server, text that is typically embedded in your code as a string literal. If you were submitting scripts you weren't submitting bytecode. Bytecode is only sent if you use the technique I presented earlier (or something similar for other language variants) in my last reply.

If you were using scripts, then this submitted script:

g.V().hasLabel('persons').has('user_id', 1)

is bad especially if you find yourself repeating that traversal over and over again with the different parameters for "user_id" (as in searching for user_id 2 or 3 or 4) because gremlin server is forced to recompile that script each time you do that as the script cache wont' be engaged. You fix that with bindings (also referred to as parameters), but not this way:

b= new Binding(); g.V().hasLabel('persons').has('user_id', b.of('user_id', 1))

That still runs into the same cache miss and recompilation issues as before. You instead want to submit a string of:

g.V().hasLabel('persons').has('user_id', x)

where "x" becomes a bindings/parameter you pass on your driver. I don't know what that looks like in PHP but in Java:

Map<String,Object> params = new HashMap<>();
params.put("x",1);
client.submit("g.V().hasLabel('persons').has('user_id', x)", params);

Taking that approach will engage the cache and eliminate expensive script compilation times. It should also greatly reduce memory usage and GC activity. If Neptune accepts "scripts" in this fashion and they process them in the same manner that we do, I would hope that they support parameterization.

Bindings as you were using them is meant for GLVs remoting traversals via bytecode to Gremlin Server. The Bindings are established on the client and are serialized with the bytecode so that they can be evaluated separately from the traversal. Depending on your graph provider, using Bindings may or may not add much value. Theorhetically, by using Bindings the traversal could be cached for future use on the server....but Gremlin Server isn't really doing that. It can also be helpful with lambdas (which we don't encourage usage of) as lambdas must be submitted as scripts and for the same reasons that you parameterize (as shown above) you would want to make sure that lambda script can be cached in the server so that it does not need to be recompiled over and over and over again. Bindings would help with that.

Hope that clears things up for you at least from the TinkerPop end - not sure I've got you a clear direction from the Neptune side of things because I still don't know how they internally optimize for such things.





To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/c3cf3de4-fa58-4b7b-b648-df7174ce93b5%40googlegroups.com.

Don Omondi

unread,
Jan 6, 2018, 5:25:18 PM1/6/18
to Gremlin-users
Thanks a bunch @Stephen, indeed things are much clearer now, thanks. By the way, I'm using JanusGraph and not Neptune, I was just very interested in this topic of 'the important of bindings' and now I can't regret joining in.

So from what I gather, the first method was indeed getting out of memory because of the script cache and garbage collection, so the assumption to use bindings was right but with the wrong bindings implementation. Although, I'm still a little bit lost how come it din't run out of memory and why it still worked but was much slower. You do mention that
Depending on your graph provider, using Bindings may or may not add much value.
So maybe that's it.

I'm indebted for this eye opener, and will see how to properly use bindings in PHP.

Regards,

dap...@gmail.com

unread,
Jan 7, 2018, 11:50:40 PM1/7/18
to Gremlin-users
Thanks for this detailed response @Stephen, from what I understand the neptune docs specifically say 


Not sure if this is just for now while they are still in preview phase, but if the performance limitations apply I'm concerned about using Neptune for any important projects. 

Stephen Mallette

unread,
Jan 8, 2018, 6:43:03 AM1/8/18
to Gremlin-users
I've read that statement too and, as I said, I'm not clear on what it means. Have you simply tried passing parameters on a REST/websockets call to see if it works? Whether it works or doesn't, you might not want to make assumptions either on how that may or may not affect performance. This new crop of TinkerPop enabled graphs systems, like DSE Graph, CosmosDB, Neptune (maybe), etc that directly implement the Gremlin Server protocols of TinkerPop may have different rules about what affects performance and what does not. If no one from Neptune can chime in here on this thread, I suggest you try to find support from them directly on this issue. If you do happen to go do that and get an answer, please let us know what you find - thanks.

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/2155ac60-1f81-4c59-bbea-0149113fc976%40googlegroups.com.

Divij Vaidya

unread,
Jan 8, 2018, 7:02:54 PM1/8/18
to Gremlin-users
Hi all,

Amazon Neptune allows the usage of bindings when the traversal is sent via the bytecode i.e. using the GLVs. As an example, you can use the bindings in the following manner using the Java GLV
 
Cluster cluster = builder.create();
GraphTraversalSource g = EmptyGraph.instance().traversal().withRemote(DriverRemoteConnection.using(cluster));
int param_id = 1;
GraphTraversal t = g.V().hasLabel(“persons”).has(“user_id”, bindings.of(“user_id”, param_id));
 
However, Neptune does not support bindings for any groovy script execution.

dap...@gmail.com

unread,
Jan 8, 2018, 10:23:25 PM1/8/18
to Gremlin-users
I'm trying to use Neptune from Node.js, and have only been able to use the javascript language driver to connect via Websockets. As @Stephen mentioned above, I'm still not very clear on the performance implications of sending traversals to Neptune via groovy script vs. byte code. Do you by any chance, have some knowledge of the internal implementations and can shed some light on this? Thank you!

Divij Vaidya

unread,
Jan 9, 2018, 12:15:51 PM1/9/18
to Gremlin-users
Using GLVs (i.e. sending via bytecode) is the recommended way to connect to Neptune since it provides the customers with the ability to write Gremlin in language of their choice. From what I understand, Gremlin Java script (correct me if I am wrong Stephen) is right around the corner and you would be able to use the JS GLV very soon.
As far as performance implications of using different clients is concerned, Neptune will have the same characteristics as other Tinkerpop enabled implementations and would behave the same way

Stephen Mallette

unread,
Jan 9, 2018, 12:39:24 PM1/9/18
to Gremlin-users
yes - gremlin-javascript is in the final rounds of review right now:




On Tue, Jan 9, 2018 at 12:15 PM, Divij Vaidya <divijv...@gmail.com> wrote:
Using GLVs (i.e. sending via bytecode) is the recommended way to connect to Neptune since it provides the customers with the ability to write Gremlin in language of their choice. From what I understand, Gremlin Java script (correct me if I am wrong Stephen) is right around the corner and you would be able to use the JS GLV very soon.
As far as performance implications of using different clients is concerned, Neptune will have the same characteristics as other Tinkerpop enabled implementations and would behave the same way
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/7b2a3172-2102-495a-ac19-f571b43cf3d8%40googlegroups.com.

dap...@gmail.com

unread,
Jan 9, 2018, 9:42:17 PM1/9/18
to Gremlin-users

Ah ok that pretty much clears up everything for me. Good to hear a GLV will be available soon. Thank you everyone!


On Wednesday, January 10, 2018 at 2:39:24 AM UTC+9, Stephen Mallette wrote:
yes - gremlin-javascript is in the final rounds of review right now:



On Tue, Jan 9, 2018 at 12:15 PM, Divij Vaidya <divijv...@gmail.com> wrote:
Using GLVs (i.e. sending via bytecode) is the recommended way to connect to Neptune since it provides the customers with the ability to write Gremlin in language of their choice. From what I understand, Gremlin Java script (correct me if I am wrong Stephen) is right around the corner and you would be able to use the JS GLV very soon.
As far as performance implications of using different clients is concerned, Neptune will have the same characteristics as other Tinkerpop enabled implementations and would behave the same way

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages