Re: [TinkerPop] Help for python client with transactional properties

417 views
Skip to first unread message

Peter Neubauer

unread,
Nov 5, 2012, 9:50:38 AM11/5/12
to gremlin-users
Benjamin,
you might want to look into the Batch API, see http://docs.neo4j.org/chunked/snapshot/rest-api-batch-ops.html which will execute several REST calls in one transaction. Since you are using Cypher, that should take care of most of your use cases?

/peter


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j 1.8 GA - http://www.dzone.com/links/neo4j_18_release_fluent_graph_literacy.html


On Mon, Nov 5, 2012 at 6:38 AM, Benjamin Garrigues <benjamin....@gmail.com> wrote:
Hi there, i'm currently investigating the various graphdb technologies for building a python/flask web service on top of a graph persistence layer. The purpose is to completely remove the traditional SQL layer and only use graphDB with blobstores for larger objects.
I was happily using py2neo with cypher on top of neo4j running in server mode, until i realized that I couldn't create a transaction in this configuration, simply because neo4j didn't offer transaction for its REST Api, and that it was the only client available for python (no native python client).

I've since looked at the bulbflow project, but couldn't find anything related to transactional behavior. The blueprint API do specify TransactionalGraph as part of its specification, but i never found anything related on the client side.

Does anyone know how what client library i could use ? I'm ready to use another db other than neo4J if that's required (which i choose only because of the heroku integration). 

Thanks, 
Benjamin

--
 
 

Marko Rodriguez

unread,
Nov 5, 2012, 9:52:31 AM11/5/12
to gremli...@googlegroups.com
Hi Ben,

I will let James talk about Bulbflow as he is the author/expert.

However, note that Gremlin provides you full access to the underlying Blueprints graph API. One of the beautiful things about Gremlin is that it doesn't change the user's perspective depending on whether you are writing to an embedded graph, a Rexster server extension, a query/traversal, etc. Its all the same -- Java/Groovy/Gremlin.

Next, with the Rexster server, you can do the REST API style (e.g. get/put vertices/elements) or (and the way I use it) you can simply push Gremlin code in and get results back. Given that Gremlin code is an extension of Groovy which is an extension of Java, you have the full Java API at your fingertips. Thus, you can read/mutate the graph and have full control over transactions of the underlying graph.

I hope that is clear.

Good luck with your project,
Marko.

http://markorodriguez.com

James Thornton

unread,
Nov 5, 2012, 12:12:38 PM11/5/12
to gremli...@googlegroups.com


On Monday, November 5, 2012 8:38:05 AM UTC-6, Benjamin Garrigues wrote:
Hi there, i'm currently investigating the various graphdb technologies for building a python/flask web service on top of a graph persistence layer. The purpose is to completely remove the traditional SQL layer and only use graphDB with blobstores for larger objects.
I was happily using py2neo with cypher on top of neo4j running in server mode, until i realized that I couldn't create a transaction in this configuration, simply because neo4j didn't offer transaction for its REST Api, and that it was the only client available for python (no native python client).

I've since looked at the bulbflow project, but couldn't find anything related to transactional behavior. The blueprint API do specify TransactionalGraph as part of its specification, but i never found anything related on the client side.



Hi Benjamin -

Bulbs uses Gremlin for transactions for both Neo4j Server and Rexster because they're usually the cleanest way to a create transaction. You can use the Neo4j Server or Rexster batch APIs, but they're not as flexible as Gremlin. 

This file contains some of Bulbs' built-in scripts for Neo4j Server. The create_indexed_vertex() and create_indexed_edge() scripts create and index a vertex/edge in a single HTTP request -- all wrapped in a transaction:


And here's how they're used -- these three Python create() methods execute the respective create_indexed_vertex() and create_indexed_edge() scripts above.

>>> from bulbs.neo4jserver import Graph
>>> g = Graph()
>>> james = g.vertices.create(name="James")
>>> julie = g.vertices.create(name="Julie")
>>> knows = g.edges.create(james, "knows", julie)

Here's the underlying code:


By default, a Gremlin object is built into the Graph object as g.gremlin. You can create custom Gremlin scripts in a text file, which allows you to get full syntax highlighting in an editor,

// gremlin.groovy

// calculate basic collaborative filtering for user_id
def rank_items(user_id) {
    m = [:]
    g.v(user_id).out('likes').in('likes').out('likes').groupCount(m)
    m.sort{a,b -> a.value <=> b.value}
    return m.values() 
} 

The scripts file can contain more than one function per file.

You add the scripts file to the Scripts object using the g.scripts.update(file_path) method. Then you can get the individual scripts by their function names:

>>> from bulbs.neo4jserver import Graph
>>> g = Graph()                              # create Neo4j Graph object
>>> g.scripts.update('gremlin.groovy')       # add file to scripts index
>>> script = g.scripts.get('rank_items')     # get a function by its name
>>> params = dict(user_id=3)                 # put function params in dict
>>> items = g.gremlin.query(script, params)  # execute the script in DB

Here's a full example of a Bulbs model that uses a custom, transactional Gremlin script to save a blog entry:


- James



James Thornton

unread,
Jan 24, 2013, 3:49:50 PM1/24/13
to gremli...@googlegroups.com
Hi Arjumand -

Sorry for the delayed response -- I was on vacation when you posted this.

I see that you're using a modified Gremlin script from Lightbulb. Since then, Blueprints has changed somewhat to make it work with Titan too, and so transactions have changed.

For example,  g.setMaxBufferSize(0); is no longer used so that needs to be taken out.

- James

On Wednesday, December 5, 2012 12:40:52 AM UTC-6, Arjumand Bonhomme wrote:
Hi James,

I'm attempting to evaluate and use Bulbs for a new project. I'm new to both Bulbs and Gremlin.  I tried to make an example of my own (using Neo4J, hopefully later with Titan), based on the lightbulbs example in your last post, but I'm stuck. I'm getting a NullPointerException from my version of your modified gremlin script.  My two example files can be found here https://gist.github.com/4212993


The NPE appears to be being thrown by the "g.idx(index_name).get(index_key, index_value).toList()" line of the "create_or_update_vertex()" closure.  I'm not sure what's the best way to debug gremlin scripts.  But even before I got the NPE, I had to modify my script to add a parameter to the outermost method/closure to access the Graph variable, which was out of scope in my example script.

Any tips you would have would be greatly appreciated.

Thanks,
Jumand

Blake Eggleston

unread,
Jan 24, 2013, 7:06:09 PM1/24/13
to gremli...@googlegroups.com
Hi Ben and Arjumand,

I would recommend you check out our thunderdome project, which we built specifically for Titan


We built it for a product running on python/flask after running into problems using bulbs in production.

Some advantages it has over bulbs are:
Vertices/Edges are first class objects
Native support for returning complex structures like tables
You don't have to register your vertices and edges with the library
Advanced Gremlin support, each model can have it's own gremlin file, and you can assign gremlin functions directly to methods on your model and call them like native python methods.
Support for Titan's static typing

It is still under development, but we are about to roll it into production for 3 sites.

Let me know if you have any questions.

Thanks,

Blake

James Thornton

unread,
Jan 25, 2013, 1:41:24 AM1/25/13
to gremli...@googlegroups.com
Hi Blake -

What issues did you run into with Bulbs/Titan?

- James

Allan Johns

unread,
Jan 25, 2013, 2:15:14 AM1/25/13
to gremli...@googlegroups.com
Hi James and Blake,

I just wanted to chip in and talk about my experience with using Bulbs so far. Initially I was using a fair part of it, but over time I've found myself having to implement more and more gremlin-related functionality outside of Bulbs, to the point now where I'm really only using Bulbs to send and execute scripts. I've completely replaced its result conversion code, for example.

Why? Because in the system I'm currently writing, I'm dealing with dynamically-generated gremlin queries all the time, and Bulbs doesn't help me construct them. In a nutshell, I need to be able to construct gremlin queries directly in python.

So for example, I can do this:

> import python_gremlin as pg
> q = pg.V('element_type','container').out('contains').filter("it.name.matches('foo.*')").out('obj').dedup()
> verts = q.execute(some_bulbs_client)

Furthermore, I need to be able to add variables to a dynamically-created query object, then retrieve their values after the query has executed, and I can do this with my own code too.

Another issue with Bulbs is thread safety... correct me if I'm wrong (I haven't looked far into this yet) but Bulbs is using urllib to communicate with the graph-db, which isn't threadsafe. This is affecting me a fair bit now, as I'm writing pySide-based Qt GUIs, and there are lots of cases where I want threaded access to my data (especially to keep the GUI responsive, for eg) but I can't do it.

Do you think Bulbs will allow me to do this kind of thing in future, or is this outside of its mandate? I'm not opposed to contributing myself, I'd rather add to an existing project than start a new one.

Thanks!
Allan

Andras Gefferth

unread,
Jan 25, 2013, 10:20:41 AM1/25/13
to gremli...@googlegroups.com
Hi All,

On Friday, January 25, 2013 8:15:14 AM UTC+1, Allan Johns wrote:
Hi James and Blake,

over time I've found myself having to implement more and more gremlin-related functionality outside of Bulbs, ...  I'm really only using Bulbs to send and execute scripts. 
 
I just had the same feeling. Although I have only started to use bulbs recently,I'm afraid that if I need to have transactions and my only choice for that is to use the groovy scripts, well then I will end up coding groovy and not python, just as you.
Bulbs is a great library, but some kind of transaction support could really extend its usability. (I know you can't just implement it in bulbs if the DB you are using has no support for it.)


Why? Because in the system I'm currently writing, I'm dealing with dynamically-generated gremlin queries all the time, and Bulbs doesn't help me construct them. In a nutshell, I need to be able to construct gremlin queries directly in python.

So for example, I can do this:

> import python_gremlin as pg
> q = pg.V('element_type','container').out('contains').filter("it.name.matches('foo.*')").out('obj').dedup()
> verts = q.execute(some_bulbs_client)


I see your point here, you need to generate the gremlin query programaticaly so you created a library for that.
But what's wrong with programaticaly updating a query string?

Like:
s = "g.V('element_type','container')"
if cond1:
    value1 = somefunction()
    s+= ".out('%s')" % value1 


 Andras

Marko A. Rodriguez

unread,
Jan 25, 2013, 11:21:00 AM1/25/13
to gremli...@googlegroups.com
Hi,

> import python_gremlin as pg
> q = pg.V('element_type','container').out('contains').filter("it.name.matches('foo.*')").out('obj').dedup()
> verts = q.execute(some_bulbs_client)

This looks interesting. One of our tickets in Gremlin is:


We are trying to get as many JVM implementations as possible. Thus far there is:

Pacer (Gremlin Ruby): https://github.com/pangloss/pacer
… if there are others people know please ping me cause I can put them on the Gremlin wiki.

Anywho, I have lots of Python buddies that would be stoked to see a Gremlin Jython. Can you say more about this package python_gremlin?

Thanks,
Marko.
 

Jonathan Haddad

unread,
Jan 25, 2013, 11:39:50 AM1/25/13
to gremli...@googlegroups.com
Hi all,

One of the primary reasons why we started building thunderdome was a desire to have better gremlin integration into our python application.  We found ourselves (poorly) extending the build in gremlin methods of Bulbs, and always writing one line wrappers around our gremlin.

We decided to start from scratch and write something that would work in a very Pythonic way, but still allow us to use the full power of gremlin.  We considered an approach similar to allan, but decided against it as it required us to rewrite many of Gremlin's methods, with very little benefit.  Multi line statements would still be a mess, so we opted instead to harness Gremlin (since it's an incredible language - thank you Tinkerpop!) 

What we do (that's been amazing so far) is allow you to attach your gremlin methods (written in groovy) directly to python objects.  I've just updated the wiki with some more examples. 


Another advantage is we don't require a globally available graph object to be passed around.  I never found it to be convenient to have to register my vertices and edges with a central, globally available object.  Thunderdome manages that for you automatically whenever you create a Vertex or Edge.

We're very focused on Titan & high performance - specifically taking advantage of it's static typing and fast edge traversals via vertex centric queries.  We allow you to leverage autotype=none to ensure you don't accidentally create vertices & edges.  This is important because once you've created an edge of a particular label, you can't change it's primary key.  We approach this by using a spec file, which is essentially a schema definition in a json format.   We're working to make this easier to use, but here's what we've got so far: https://github.com/StartTheShift/thunderdome/wiki/Spec-Files

We're still flushing out the docs so there's probably a few areas that need improvement.

It needs to be said that we're focused on making the best Titan library available - but as a result we're not a general purpose toolkit, as of now.  So you won't be able to use thunderdome with Neo4j - at least not without a little bit of tinkering.

Jon

James Thornton

unread,
Jan 25, 2013, 3:57:33 PM1/25/13
to gremli...@googlegroups.com
Hi Jon -


On Friday, January 25, 2013 10:39:50 AM UTC-6, Jonathan Haddad wrote:

What we do (that's been amazing so far) is allow you to attach your gremlin methods (written in groovy) directly to python objects.  I've just updated the wiki with some more examples. 



I like what you've done with automatically naming Python methods for the Gremlin methods in the models.
 
Another advantage is we don't require a globally available graph object to be passed around.  I never found it to be convenient to have to register my vertices and edges with a central, globally available object.  Thunderdome manages that for you automatically whenever you create a Vertex or Edge.

Registering elements is done to handle the case when a Gremlin query returns more than one type of vertex or edge -- the element registry enables Bulbs to automatically return the proper Python objects, regardless of what model executed the query.
 
- James

Jonathan Haddad

unread,
Jan 25, 2013, 4:18:52 PM1/25/13
to gremli...@googlegroups.com
We handle the registation of the Vertex or Edge within thunderdome's Element metaclass.  When you create the Vertex or Edge, we keep track of that internally.  It's a lot more convenient, as the models become immediately available and don't require the global graph object, or to be explicily registered (which we constantly forgot to do - it's not very intuitive).

We also recursively create objects - something bulbs doesn't currently do.  If you return a table object, bulbs won't give you anything useful back.  We needed to extend the built in bulbs methods for executing gremlin and it ended up being very awkward.    With thunderdome, you can pass back a map, with a Vertex and a list of edges, and everything will retain the same overall structure but give you python objects, which is very convenient for tables, trees, and other ad-hoc data.

Jon

Andras Gefferth

unread,
Jan 27, 2013, 12:39:28 PM1/27/13
to gremli...@googlegroups.com
Hi Jon,

if you don't have the  graph object, can you connect to several different DBs, or different graphs with your client?
How do you know which DB to connect to?
From the example, I see that you need to create the connection before defining your vertex and edge classes. Is this correct?

So if my models are in a module, I would use something like
>>> import thunderdome
>>> from thunderdome.connection import setup
>>> setup(['localhost'], 'thunderdome')
>>> import my_models
?

Jonathan Haddad

unread,
Jan 27, 2013, 1:17:02 PM1/27/13
to gremli...@googlegroups.com
We made the concession that there's only one graph, and it's configured for the whole application.  

You wouldn't need to call setup() before importing the models.


--
 
 



--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

James Thornton

unread,
Jan 27, 2013, 6:27:40 PM1/27/13
to gremli...@googlegroups.com


On Friday, January 25, 2013 9:20:41 AM UTC-6, Andras Gefferth wrote:
I see your point here, you need to generate the gremlin query programaticaly so you created a library for that.
But what's wrong with programaticaly updating a query string?

Like:
s = "g.V('element_type','container')"
if cond1:
    value1 = somefunction()
    s+= ".out('%s')" % value1 


In fact, you should AWLAYS use query params rather than hard coding values because if you don't, the Gremlin Script Engine thinks it's a new script and will have to recompile the script each time, and you'll be blowing through its script cache which will kill your performance.

- James
 

Allan Johns

unread,
Jan 27, 2013, 8:34:31 PM1/27/13
to gremli...@googlegroups.com
On Mon, Jan 28, 2013 at 10:27 AM, James Thornton <james.t...@gmail.com> wrote:


On Friday, January 25, 2013 9:20:41 AM UTC-6, Andras Gefferth wrote:
I see your point here, you need to generate the gremlin query programaticaly so you created a library for that.
But what's wrong with programaticaly updating a query string?

Like:
s = "g.V('element_type','container')"
if cond1:
    value1 = somefunction()
    s+= ".out('%s')" % value1 


I started this way, and it became extremely tedious. The majority of my code is actually constructing dynamic queries... switching to my approach greatly reduced the amount of code I have to write to do the same thing.
 

In fact, you should AWLAYS use query params rather than hard coding values because if you don't, the Gremlin Script Engine thinks it's a new script and will have to recompile the script each time, and you'll be blowing through its script cache which will kill your performance.


Yeah I thought about this, and it's something I plan to do, but I'm leaving it as a later optimisation. I still need to dynamically generate all queries though, so even if I construct parameterised equivalents, I don't know if I'm going to get enough cache hits for it to be a worthwhile exercise. But, time permitting, I'm going to try it anyway.

 
- James
 

--
 
 

Allan Johns

unread,
Jan 27, 2013, 8:37:20 PM1/27/13
to gremli...@googlegroups.com
On Sat, Jan 26, 2013 at 8:18 AM, Jonathan Haddad <jonatha...@gmail.com> wrote:
We handle the registation of the Vertex or Edge within thunderdome's Element metaclass.  When you create the Vertex or Edge, we keep track of that internally.  It's a lot more convenient, as the models become immediately available and don't require the global graph object, or to be explicily registered (which we constantly forgot to do - it's not very intuitive).

We also recursively create objects - something bulbs doesn't currently do.  If you return a table object, bulbs won't give you anything useful back.  We needed to extend the built in bulbs methods for executing gremlin and it ended up being very awkward.    With thunderdome, you can pass back a map, with a Vertex and a list of edges, and everything will retain the same overall structure but give you python objects, which is very convenient for tables, trees, and other ad-hoc data.

I do something similar, in my case I needed to convert Tree objects back to their python equivalent, and because of this I had to stop using Bulbs' execution methods.
 

Allan Johns

unread,
Jan 27, 2013, 8:54:52 PM1/27/13
to gremli...@googlegroups.com
Hi Marko,

Unfortunately Jython isn't really an option for me. My industry for good or bad is based heavily around CPython, and there doesn't appear to be a way to nicely integrate the two. So, python_gremlin simply mimics Gremlin's API, then constructs the appropriate groovy at execution, sends it to Rexster/etc server, and deals with converting the result back to some Python equivalent. It's not really a language binding per se.

Wrt access, python_gremlin is a private project owned by my employer. There may be scope to open source it in future, but we're much too busy right now to put any resources onto that.

Cheers
A

Siddhartha Kasivajhula

unread,
Jan 27, 2013, 9:39:17 PM1/27/13
to gremli...@googlegroups.com
Relevant for the original question and discussion around transactions, there's a python 'transaction' package that's really handy, you can probably use it for implementing transactions in the python client (in addition to bulbs or whatever existing means of interacting with the graph you use):


Using it for this purpose would entail writing a "Data Manager" for your graph backend, as described in that link. The package supports multiple different storage backends (e.g. mysql, mongodb, a graphdb) as part of the same transaction, which would be an added benefit.

-Sid


--
 
 

Andras Gefferth

unread,
Jan 28, 2013, 4:41:34 AM1/28/13
to gremli...@googlegroups.com

In fact, you should AWLAYS use query params rather than hard coding values because if you don't, the Gremlin Script Engine thinks it's a new script and will have to recompile the script each time, and you'll be blowing through its script cache which will kill your performance.


Yeah I thought about this, and it's something I plan to do, but I'm leaving it as a later optimisation. I still need to dynamically generate all queries though, so even if I construct parameterised equivalents, I don't know if I'm going to get enough cache hits for it to be a worthwhile exercise. But, time permitting, I'm going to try it anyway.


Actually if you have dynamic queries with little chance to execute the same query twice,  is it possible and would it make sense to disable the query script cache?

In case of using Rexster, is this cache kept by Rexster, or the DB behind it?


Stephen Mallette

unread,
Jan 28, 2013, 6:55:29 AM1/28/13
to gremli...@googlegroups.com
The ongoing issue with dynamic gremlin is the cache held by the Groovy
ScriptEngine. If you use non-parameterized Gremlin, then the:

<script-engine-reset-threshold>

should be set to a reasonable number (default 500) so as to force the
cache to be reset. That reset comes with some cost. Without the
reset, OutOfMemoryExceptions will ensue. If you use parameterized
gremlin, then you should be in good shape and don't need a reset (set
to -1). Thus far, there has been little we could do about this
problem as we've found it inherent to the Groovy ScriptEngine itself

We've been spending more time thinking about this issue recently as
more and more people are relying on RexPro, "stored procedures", etc.
that all rely on the Gremlin Groovy ScriptEngine. I believe that we
will have more to say on this issue soon.

Stephen
> --
>
>
Reply all
Reply to author
Forward
0 new messages