[TinkerPop3] Gremlin Server and Sessions RFC

375 views
Skip to first unread message

Stephen Mallette

unread,
Jan 9, 2014, 9:13:32 AM1/9/14
to gremli...@googlegroups.com
Rexster allowed in-session and sessionless requests, where the concept of "in-session" meant that every Gremlin script processed in that session occurred within the context of its own set of variable bindings for a ScriptEngine and executed within the same thread across requests.  This mode of operation was important for use cases like Rexster Console where there was a need to incrementally execute Gremlin across multiple requests.  In other words, I might issue each of the following lines in three separate requests:

v = g.addVertex()
v.setProperty('name','stephen')
g.commit()

As there is a notion of a "session", the variable "v" initialized on the first line is kept on the server and placed into the bindings for the next request on the second line.  Finally, you have to manage your own transaction and commit the change.

This mode of operation is in contrast to sessionless communication where, the entire script would be issued as a single request, likely without the g.commit() as the server would auto-commit/rollback for you at the end of the request.

The question is whether Gremlin Server should follow in this pattern and support both in-session and sessionless requests.  My personal feeling is that Gremlin Server should be focused and simple, thus only having sessionless requests.  I haven't found any production use cases for in-session requests outside of the Rexster Console one and most implementations seem to focus on sessionless requests. Removing in-session as a requirement would yield a few benefits:

1. Less choice for those using Gremlin Server - one way to execute remote Gremlin and it's the "best" way.
2. A bit easier to write clients to Gremlin Server in other languages as the protocol gets simplified
3. Less code and code complexity in Gremlin Server...i suppose this could translate to less resources consumed on the server, but mostly means easier maintenance.

I think at this point I'm trying to weigh the value of having the in-session feature versus the benefits we would get without it.  If anyone currently uses the "session" feature built into RexPro (outside of Rexster Console) and has some really good production use cases for it, I'd love to hear about them.  Further, if anyone just generally thinks that dropping sessions is just a bad idea, I'd like to hear about that as well.

Best regards,

Stephen

Matthias Broecheler

unread,
Jan 9, 2014, 2:23:29 PM1/9/14
to gremli...@googlegroups.com
Hi Stephen,

I am also leaning toward session-less only Gremlin server. Another "danger" of maintaining sessions on the database server is that you can have potentially very long running transactions (for sessions that aren't closed or just linger on) that consume system resources or locks.
So, it should be highly discouraged to allow session connections to production systems that have a non-trivial work load due to this possible interference.

However, what would be really nice is to have a python notebook based exploratory shell for data science or exploratory access to the graph. For instance, I like GraphLab's cloud notebook. When you are getting started, this is really useful. I wonder if this can be simulated over a session-less connection by simply including the necessary state with each request and only holding on to the state locally.
So, the user would write a serious of short gremlin scripts (mostly one-liners) that get executed on the server and the state is held onto locally. Kind of like how mathematica works.

For example:

v = g.V.has('name','Marko');
-------
v.out('knows')

The first query would get executed and the transaction committed. The client would hold onto the id of v and name it v_id. Then the second query would be rewritten to:
g.v(v_id).out('knows')
where v_id is passed in as a parameter.

This would be more limited than rexster console in that you don't have the concept of longer transactions (everything is committed immediately) but this will be used mostly for exploration anyways.

WDYT?
Cheers,
Matthias


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Matthias Broecheler
http://www.matthiasb.com

Pieter Martin

unread,
Jan 9, 2014, 2:27:09 PM1/9/14
to gremli...@googlegroups.com
Hi,

Sessions also make HA and clustering and all that production pain much
more painful.

Regards
Pieter

Stephen Mallette

unread,
Jan 9, 2014, 3:00:15 PM1/9/14
to gremli...@googlegroups.com
Matthias, the more I think on this, the more I think the "graph notebook" concept is a different "server" designed more specifically for that purpose.  I do see how what you propose might work.  The main problem that you run into (though your example would be ok) is in not always having something that cleanly serializes back to the client from the server that can then easily be sent back on the second request.  

Thanks,

Stephen

Borys Pierov

unread,
Apr 24, 2014, 5:51:55 AM4/24/14
to gremli...@googlegroups.com
Hi Stephen,

in my opinion sessions should be removed (unless session == single connection mode).
Although I can imagine situation when sessions can be useful I think that it's a thing that rather shouldn't be used. In our code we explicitly wrapping our queries into closures/anonymous code blocks to ensure that there will no be any tampering.

Also I'd like to ask whether we should assume that the main connectivity approach in Gremlin Server will be a 'session-less single connection'?

Best Regards,
Borys Pierov.

Stephen Mallette

unread,
Apr 24, 2014, 8:21:16 AM4/24/14
to gremli...@googlegroups.com
Thanks for your feedback.  I guess the connectivity approach will depend on the client that you choose to use.  In other words, I suppose it would be possible for a third-party to write a client that allowed a single open connection to Gremlin Server and issued all requests over that single connection.  I'd further imagine that the third-party could write a client that did not support the streaming mechanism in Gremlin Server thus providing an environment much like how Rexster worked.

If you were to use the java client I'm writing for the reference implementation then you'd have much more advanced functionality.  A client will maintain a pool of sessionless connections issuing requests to Gremlin Server over them.  The client will be responsible for assimilating the streamed results back into order.  Generally speaking, the client will provide you a CompletableFuture (http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html) to work with (though there will be blocking methods as well) for dealing with results.


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Borys Pierov

unread,
Apr 24, 2014, 8:30:14 AM4/24/14
to gremli...@googlegroups.com
Stephen,

thanks for your reply!

Although your test client seems to be a great thing and it would be great to try it I'm afraid that I'm exactly that 3rd party that you're talking about - since our project written in Python and we need a flexible and efficient client for Rexster/Gremlin Server it could be that we will need to write it for ourselves (well I think that we will be ablt to make sue of RexPro Python at the moment but who knows in what stat it will be by the moment when Tinkerpop 3 will be released). That's why I'm highly interested in technical details about planned Gremlin Server connectivity options that's so I can take them into account while designing our code and working on Gremlin Server client.

Borys

Stephen Mallette

unread,
Apr 24, 2014, 8:43:26 AM4/24/14
to gremli...@googlegroups.com
I haven't said much about the client or the protocol just yet, because it keeps changing so drastically.  My goal is to have a reasonable reference implementation by the end of May, which would be stable enough for those hoping to write language bindings.  

James Thornton (of Bulbs) has been suggesting I find existing protocol formats to help make writing Gremlin Server clients easier (i.e. one that has existing support in different languages like json-rpc), but thus far none have matched up to the requirements I have....so while I'm still looking we may be just left with a custom Gremlin Server format over websockets.  Can't keep changing things at this point...need to settle on an approach and go with it.


Borys Pierov

unread,
Apr 24, 2014, 8:49:43 AM4/24/14
to gremli...@googlegroups.com
Stephen,

I can understand the situation you are currently in that's IMHO you current decision to go for websockets with custom Gremlin Server format seems to be a good one. Though personally I'm not sure how efficient will be streaming responses but I will not know for sure until I try.

Looking forward for new details. ;)

Stephen Mallette

unread,
Apr 24, 2014, 9:02:54 AM4/24/14
to gremli...@googlegroups.com
The reason I wanted to stream back results in specified fragments was that if you sent a few requests to Rexster that returned a large result set, you would tie up Rexster threads and resources in the serialization of that fat result set as it iterated the entire thing into memory and then flushed it down the wire.  Rexster would start blocking requests and appear to stop processing.  If your requests do not return such results then it might be best to maintain that standard request/response pattern as opposed to paging back fragments of results (haven't tested that, of course...just guessing).  Gremlin Server will let you control that behavior (both on the server and then overridden per request).

Borys Pierov

unread,
Apr 24, 2014, 9:24:03 AM4/24/14
to gremli...@googlegroups.com
So you thinking that serialization and sending response (for a big responses) will be a bottleneck... Well it quite possible. What about workers count in Gremlin Server? Will there be such an option in config?
Also will it be possible to use several instances of Gremlin Server deployed on different machines?

Stephen Mallette

unread,
Apr 24, 2014, 10:04:58 AM4/24/14
to gremli...@googlegroups.com
Similar threading options will be available in Gremlin Server, though I think the guidance for tuning them will be easier to figure out.  You will basically have:

+ boss pool will always be 1 (that is configurable now, but i don't think that's something to mess with so I will likely remove that option before release)
+ worker pool should be equivalent to the number of cores or less
+ gremlin pool which handles blocking operations (e.g. script evaluation) ... this setting is the one you'll have to play with per your requirements

You will deploy Gremlin Server in the same manner as you do Rexster.  So in the case of distributed graphs like Titan you would place one Gremlin Server on each node of the cluster.  Gremlin Server will be "cluster-aware" for graphs that support that feature and hence advanced clients such as the reference implementation for Java will be able to do some intelligent things given information provided from the client.  I don't have much more information on that at the moment and I'm not sure if such features will be ready for review/comment by end of May.   My main focus is to finalize enough of Gremlin Server and the reference implementation to allow creation of language bindings that simulate at least as much as RexPro had.  

Borys Pierov

unread,
Apr 24, 2014, 10:09:14 AM4/24/14
to gremli...@googlegroups.com
Stephen, thank you very much for your replies!

It clarified a lot of things regarding future release of Gremlin Server for me.
Sorry that I was not of much use in this RFC thread. :)

Dmill

unread,
Apr 25, 2014, 10:05:39 AM4/25/14
to gremli...@googlegroups.com
Hey Stephen.

I'm guessing this means we lose Transactions over multiple queries? This would be an issue for us, and probably a few others. 
From a non Java perspective, it isn't uncommon for us to break functionality (queries) down into various methods and then string these methods together in a Transaction. For instance when deleting a vertex you might have some form of bubbling where other elements get deleted.
As a simple example we could imagine a graph with user vertices and action vertices. A user performs multiple actions. When a user is removed, so are all of his actions. 
From an object perspective you will most likely have a User object and an Action object that both implement a delete() method; In the case of the Action.delete() it will simply remove the action vertex and it's edges, for User.delete() it would be something along the lines of:

- retrieve all user actions. (query 1)
- run Action.delete() on all of them. (for the sake of simplicity lets say this runs a single query on each call. (query 2 to say 5)) 
- Finally delete user. (query 6)

All this would be run within a Transaction, with or without some more queries on top of it.
For testing and maintainability purposes you will want to have independent delete() methods instead of having everything in one gremlin script. It also makes adding new bubbling simple. This way of handling your queries & database is quite common coming over from PHP + MYSQL. And would definitely make Gremlin server a no go for us. Stringing all this into a single query adds complexity and makes for difficult code maintenance (adding more bubbling would have to be done on every level, that could be a fair amount of code redundancy).

Maybe I'm misunderstanding. Anyways that's my 2 cents.

Stephen Mallette

unread,
Apr 25, 2014, 10:53:10 AM4/25/14
to gremli...@googlegroups.com
Gremlin Server would encapsulate a transaction into a single request, so you are understanding properly.  As I mentioned in another post somewhere in the last day or so, just because Gremlin Server won't support the notion of a session natively, doesn't mean it won't be possible to build a session based processor to plugin to it.  Perhaps if there is enough interest I might consider adding "sessions" as an extension to what is configured as standard out of the box.  I can say that such a feature won't be on my radar for the end of May with respect to getting the reference implementation built. 

Thanks for the feedback,

Stephen


--

Dylan Millikin

unread,
Apr 25, 2014, 11:04:57 AM4/25/14
to gremli...@googlegroups.com
I see. 
Thanks for the heads up. It's only natural that it wouldn't be on your radar for the time being. And as much as I believe it's crucial functionality to have, I'm certainly not one to want everything right away ;) . Hopefully it'll make it's way onto someones todo list. This aside I've been rather excited about all the changes for the stack. You guys have been doing wonderful.

Cheers.


--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/JZ7c0y-DfZs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.

Stephen Mallette

unread,
Jul 24, 2014, 7:40:09 AM7/24/14
to gremli...@googlegroups.com
After much thought, I've returned to the idea of allowing for sessions in Gremlin Server.  Interestingly, the Gremlin Server design allowed for its inclusion without really impacting the subprotocol, architecture, etc.  The reason I've added it back relates to the fact that Gremlin Server represents the means by which non-jvm languages can connect to the TinkerPop3 world.  Without a session there is no opportunity for those languages to easily support a transaction over multiple requests (which might be a useful feature in that case).  The limitation of course, is that the session lives within a single machine as many of the bindings constructed in a Gremlin session tend to not be serializable and therefore aren't easily shared outside of a single JVM.  

I think that this leaves a clear use-case for "sessions" in Gremlin Server, in that the only reason to do it is if you have to create a transaction that spans request/response boundaries.  If you don't have to do that, then stay sessionless.

Stephen


On Friday, April 25, 2014 11:04:57 AM UTC-4, Dmill wrote:
I see. 
Thanks for the heads up. It's only natural that it wouldn't be on your radar for the time being. And as much as I believe it's crucial functionality to have, I'm certainly not one to want everything right away ;) . Hopefully it'll make it's way onto someones todo list. This aside I've been rather excited about all the changes for the stack. You guys have been doing wonderful.

Cheers.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/JZ7c0y-DfZs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to .

Borys Pierov

unread,
Jul 24, 2014, 7:44:23 AM7/24/14
to gremli...@googlegroups.com
Stephen,

great to hear it! :) It really simplifies a lot of things for users from non-JVM world.

Best Regards,
Borys Pierov.

Dylan Millikin

unread,
Jul 24, 2014, 8:42:59 AM7/24/14
to gremli...@googlegroups.com
I second what was said this is great news for us.
Keep up the amazing work.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/43c5382e-3f92-4f66-9bde-66d8cb755a8c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages