Performance of python bindings

849 views
Skip to first unread message

tcb

unread,
Apr 26, 2012, 7:55:48 AM4/26/12
to ne...@googlegroups.com
Hi,

I am running into some performance issues with the python bindings for neo. The python bindings themselves are very nice and work just fine, but there is considerable overhead in the JPype stuff- this has been investigated a few times already:


There was another project to implement python bindings with jcc ( https://github.com/OneSaidWho/neo4py ), but it only works for neo 1.3 and takes a different approach to a number of things compared with the 'official' python bindings.

Python can be slow enough already without losing significant performance in the binding layer itself. Is there something that can be done to speed it up? For example- is there some way to improve the JPype bindings? I don't see anything they are obviously doing wrong- perhaps someone with more expertise could advise. Is it likely that a jcc binding would be faster than JPype? Is it worth putting in the effort to make an up to date jcc binding? Is is possible to make a jcc binding compatible with the JPype bindings, or is there some reason why they were done differently?

There is another project ( http://py4j.sourceforge.net/about.html ) which allows calling java from python, and while it looks very good, it is only likely to be slower than JPype ("In terms of performance, Py4J has a bigger overhead than both of the previous solutions [JPype, jython] because it relies on sockets, but if performance is critical to your application, accessing Java objects from Python programs might not be the best idea").

Any other ideas appreciated...


James Thornton

unread,
Apr 26, 2012, 8:04:56 AM4/26/12
to ne...@googlegroups.com
You can use Bulbs (http://bulbflow.com) to connect to Neo4j Server from Python.

- James

tcb

unread,
Apr 26, 2012, 8:18:27 AM4/26/12
to ne...@googlegroups.com
On Thu, Apr 26, 2012 at 1:04 PM, James Thornton <james.t...@gmail.com> wrote:
You can use Bulbs (http://bulbflow.com) to connect to Neo4j Server from Python.


thanks, looks very interesting alright. It's setting up a request to the neo4server, right? Do you have any performance comparisons, compared to using the embedded database directly in python? I would be somewhat surprised if was quicker than the JPype backed, which is using jni...?

Peter Neubauer

unread,
Apr 26, 2012, 8:24:34 AM4/26/12
to ne...@googlegroups.com
Also, if you are about server performance, we are experimenting with
streaming HTTP, which is much faster and less memory overhead, see
http://blog.neo4j.org/2012/04/streaming-rest-api-interview-with.html

This is in the current 1.8-SNAPSHOT, but the format and headers might
still change a bit before settling down. Could you please try it and
get back with performance impressions?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

James Thornton

unread,
Apr 26, 2012, 8:29:27 AM4/26/12
to ne...@googlegroups.com
What type of queries are you doing? 

The request overhead is negligible when executing Gremlin (https://github.com/tinkerpop/gremlin/wiki) or Cypher queries on the server.

Example:

>>> from bulbs.neo4jserver import Graph
>>> g = Graph()
>>> script = "g.v(id).out('like').in('like').groupCount.cap"
>>> params = dict(id=1)
>>> vertices = g.gremlin.query(script, params)

- James

Nigel Small

unread,
Apr 26, 2012, 8:46:52 AM4/26/12
to ne...@googlegroups.com
If it helps, I have also introduced callbacks into py2neo (http://py2neo.org/) to work alongside the new streaming Cypher capabilities. The example below shows how rows returned from a Cypher query can be processed as they are delivered instead of waiting for the whole query to return.

---
#!/usr/bin/env python

"""
Simple example showing node and relationship creation plus
execution of Cypher queries
"""

from __future__ import print_function

# Import Neo4j modules
from py2neo import neo4j, cypher

# Attach to the graph db instance
graph_db = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")

# Create two nodes
node_a, node_b = graph_db.create_nodes(
    {"name": "Alice"},
    {"name": "Bob"}
)

# Join the nodes with a relationship
rel_ab = node_a.create_relationship_to(node_b, "KNOWS")

# Build a Cypher query
query = "START a=node({}) MATCH a-[:KNOWS]-&gt;b RETURN a,b".format(node_a.id)

# Define a row handler...
def print_row(row):
    a, b = row
    print(a["name"] + " knows " + b["name"])

# ...and execute the query
cypher.execute(query, graph_db, row_handler=print_row)
---

Matt Luongo

unread,
Apr 26, 2012, 11:44:53 AM4/26/12
to ne...@googlegroups.com
I'd be interested to see how you're using the graph. While the various REST bindings will definitely be slower, maybe a move to Gremlin or Cypher for some of your work would either make them palatable, or avoid some of the JPype overhead?

tcb

unread,
Apr 26, 2012, 12:40:17 PM4/26/12
to ne...@googlegroups.com
Hi,

Thanks for the feedback.

I boiled it down to a very simple test on a graph with 1e5 edges- just looping over each Relationship and getting some property. I tested it with python bindings (JPype), java and with py2neo rest interface using callbacks. In python the basic loop is this:

for

rel in self.db.relationships:

            print 'date:', rel['date']


The results are:


java 2sec

python 25sec

py2neo 131sec

tcb

unread,
Apr 26, 2012, 12:55:39 PM4/26/12
to ne...@googlegroups.com
On Thu, Apr 26, 2012 at 5:40 PM, tcb <thecolo...@gmail.com> wrote:
Hi,

Thanks for the feedback.

I boiled it down to a very simple test on a graph with 1e5 edges- just looping over each Relationship and getting some property. I tested it with python bindings (JPype), java and with py2neo rest interface using callbacks. In python the basic loop is this:

for

rel in self.db.relationships:

            print 'date:', rel['date']


The results are:


java 2sec

python 25sec

py2neo 131sec



sorry- I hadn't finished that...

It seems that straight java is pretty quick, with python an order of magnitude behind and the rest client far behind- of course this only applies to this specific test. I didn't look at testing the new http streaming stuff- how would I do this from python?
 
I am trying to use neo4j as a kind of graph representation and integrating it with other code I have for doing various kinds of network analysis. So looping over nodes and edges and doing something with their properties is a fairly common pattern. Of course, as Matt points out, it may be better to turn some of these operations into cypher/gremlin queries and let the server do all the work, but that means re-doing a lot of my existing (and working) python code. Also this would really work well if you could construct a query which returns a single result, or a small list of nodes/edges- but if you need to iterate over all nodes, edges then you still run into the problem of slow bindings.

From the profile of the python code, the time is spent setting up the JPype connection for each step of the iteration. Clearly the actual operations performed by the jvm are pretty quick. If the JPype stuff could be improved even a little, then it would be a big win for using neo from python.

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   141503    8.980    0.000   10.953    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/jpype/_jcollection.py:215(_iterNext)
    20756    1.966    0.000    2.361    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/neo4j/_backend.py:153(decorator)
1199455/1199443    1.417    0.000    2.156    0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/jpype/_jclass.py:81(_javaGetAttr)

Peter Neubauer

unread,
Apr 26, 2012, 1:03:10 PM4/26/12
to ne...@googlegroups.com
Hi there,
the source for the Python bindings is at
https://github.com/neo4j/python-embedded, feel free to fork and
contribute!

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j


Rohit

unread,
Apr 26, 2012, 4:13:26 PM4/26/12
to Neo4j
I explored the python speed issue neo4j-embedded at a very basic level
here:

http://triloki.github.com/blog/2012/04/13/python-and-neo4j-performance/

Hope it helps.

Cheers

On Apr 26, 6:03 pm, Peter Neubauer <peter.neuba...@neotechnology.com>
wrote:
> Hi there,
> the source for the Python bindings is athttps://github.com/neo4j/python-embedded, feel free to fork and
> contribute!
>
> Cheers,
>
> /peter neubauer
>
> G:  neubauer.peter
> S:  peter.neubauer
> P:  +46 704 106975
> L:   http://www.linkedin.com/in/neubauer
> T:   @peterneubauer
>
> If you can write, you can code - @coderdojomalmo
> If you can sketch, you can use a graph database - @neo4j
>
>
>
>
>
>
>
> On Thu, Apr 26, 2012 at 6:55 PM, tcb <thecolourblu...@gmail.com> wrote:
> > /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/s ite-packages/jpype/_jcollection.py:215(_iterNext)
> >     20756    1.966    0.000    2.361    0.000
> > /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/s ite-packages/neo4j/_backend.py:153(decorator)
> > 1199455/1199443    1.417    0.000    2.156    0.000
> > /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/s ite-packages/jpype/_jclass.py:81(_javaGetAttr)
>
> >> On Thu, Apr 26, 2012 at 4:44 PM, Matt Luongo <mhluo...@gmail.com> wrote:
>
> >>> I'd be interested to see how you're using the graph. While the various
> >>> REST bindings will definitely be slower, maybe a move to Gremlin or Cypher
> >>> for some of your work would either make them palatable, or avoid some of the
> >>> JPype overhead?
>
> >>> On Thursday, April 26, 2012 7:55:48 AM UTC-4, tcb wrote:
>
> >>>> Hi,
>
> >>>> I am running into some performance issues with the python bindings for
> >>>> neo. The python bindings themselves are very nice and work just fine, but
> >>>> there is considerable overhead in the JPype stuff- this has been
> >>>> investigated a few times already:
>
> >>>>https://github.com/neo4j/python-embedded/issues/15
>
> >>>> There was another project to implement python bindings with jcc (
> >>>>https://github.com/OneSaidWho/neo4py), but it only works for neo 1.3 and
> >>>>https://github.com/OneSaidWho/neo4py), but it only works for neo 1.3 and

Matt Luongo

unread,
Apr 26, 2012, 4:29:50 PM4/26/12
to ne...@googlegroups.com
I really would try Gremlin for this with python-embedded or py2neo (or neo4j-rest-client or w/e). Intuitively, consider if we were doing this with a SQL database. If you had to do a roundtrip to the server for each row, that would be incredibly slow- instead, you'd ask the server for all the rows. I imagine JPype is imposing something roughly equivalent to a "round trip cost" with all the Python/Java object conversions going on.

--
Matt Luongo
Software Developer
about.me/luongo
Reply all
Reply to author
Forward
0 new messages