gremlin-python and neo4j

573 views
Skip to first unread message

Wolfgang Fahl

unread,
Sep 23, 2019, 10:37:30 AM9/23/19
to Gremlin-users
In the general gremlin-python tutorial discussion: https://groups.google.com/forum/#!topic/gremlin-users/9DoPGfx9Jnk
getting Neo4J to work is one of the issues.


  1. Configure Neo4j in embedded mode within Gremlin Server. An example of this is here where that file points to a Neo4j configuration file. Note that these files are packaged in the Gremlin Server distribution as samples so you can run them directly quite easily.
  2. Modify the aforementioned Neo4j configuration file to run Neo4j in HA mode effectively turning Gremlin Server into a node in the Neo4j cluster.
  3. Configure a neo4j-gremlin-bolt instance which will use the Neo4j Bolt protocol to connect to the running Neo4j Server. While I"m not completely familiar with this implementation, I can see that you would change the gremlin.graph to com.steelbridgelabs.oss.neo4j.structure.Neo4JGraph and that you would discern Bolt configuration options from this class (which instantiates that graph instance).

When I tried approach #1 the effect would be that the graph modifications do not show in http://localhost:7474/
Now Stephen wrote:

If you want Gremlin Server and Neo4j Server both operating on the same graph you can't configure Gremlin Server to use Neo4j embedded.

So my first question is why this is so. In neo4j-empty.properties
there is a setting:

gremlin.neo4j.directory=/tmp/neo4j

and I made sure that the runNeo4J script of the tutorial uses the same directory for the docker engine

grep data scripts/runNeo4j
# prepare data directory (if not there yet)
data
=/tmp/neo4j
if [ ! -d $data ]
 mkdir
-p $data
 
--volume=$data:/data \




Where does the "embedded" mode keep it's data and why is it no accessible from the webbrowser?

Stephen Mallette

unread,
Sep 23, 2019, 10:56:57 AM9/23/19
to gremli...@googlegroups.com
It is a restriction imposed by Neo4j. It does not allow more than one process to access the same directory. 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/d038a6fc-5eb2-4876-9871-ed32102825c3%40googlegroups.com.

Wolfgang Fahl

unread,
Sep 23, 2019, 11:28:07 AM9/23/19
to Gremlin-users

Am Montag, 23. September 2019 16:56:57 UTC+2 schrieb Stephen Mallette:
It is a restriction imposed by Neo4j. It does not allow more than one process to access the same directory. 

 I don't get this. I am starting Neo4j from a docker image. The neo4j process in the docker environmet is accessing the data directory via a volume and I can modify things with the browser e.g. load a graph.

The Server I am starting with run -n accesses the directory directly and I can also modify things e.g. load the modern or air-route graph.
Both parts seem to work fine but they don't share their result althought they manipulate the same data structure in the directory. How can this be?

The High availability  option described in http://tinkerpop.apache.org/docs/current/reference/#_high_availability_configuration looks frightenly complex.

The bolt option has the following issue:


I feel like I am stuck between a few rocks and hard places again.

Stephen Mallette

unread,
Sep 24, 2019, 6:32:55 AM9/24/19
to gremli...@googlegroups.com
I'm not sure how else to explain it. You can't have two separate JVM processes accessing the same Neo4j data files. Hopefully, the following Gremlin Console session makes what I'm saying more clear - of special note is the Neo4j exception message:

"Please see the attached cause exception "Unable to obtain lock on store lock file: /tmp/neo4j/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)"

gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[community single [/tmp/neo4j]]
gremlin> graph2 = Neo4jGraph.open('/tmp/neo4j')
Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /tmp/neo4j
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /tmp/neo4j
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:212)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:125)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:137)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:130)
at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:107)
at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:199)
at org.neo4j.tinkerpop.api.impl.Neo4jFactoryImpl.newGraphDatabase(Neo4jFactoryImpl.java:46)
at org.neo4j.tinkerpop.api.Neo4jFactory$Builder.open(Neo4jFactory.java:32)
at org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph.<init>(Neo4jGraph.java:95)
at org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph.open(Neo4jGraph.java:109)
at org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph.open(Neo4jGraph.java:118)
at org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$open.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
...
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:502)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.internal.locker.StoreLockerLifecycleAdapter@15586843' was successfully initialized, but failed to start. Please see the attached cause exception "Unable to obtain lock on store lock file: /tmp/neo4j/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:208)
... 77 more
Caused by: org.neo4j.kernel.StoreLockException: Unable to obtain lock on store lock file: /tmp/neo4j/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)
at org.neo4j.kernel.internal.locker.StoreLocker.storeLockException(StoreLocker.java:117)
at org.neo4j.kernel.internal.locker.StoreLocker.unableToObtainLockException(StoreLocker.java:110)
at org.neo4j.kernel.internal.locker.GlobalStoreLocker.haveLockAlready(GlobalStoreLocker.java:73)
at org.neo4j.kernel.internal.locker.StoreLocker.checkLock(StoreLocker.java:65)
at org.neo4j.kernel.internal.locker.GlobalStoreLocker.checkLock(GlobalStoreLocker.java:60)
at org.neo4j.kernel.internal.locker.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:36)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 79 more

> The High availability  option described in http://tinkerpop.apache.org/docs/current/reference/#_high_availability_configuration looks frightenly complex.

maybe - but it's really no different than configuring a cluster of Neo4j Server instances according to their instructions. you're effectively making Gremlin Server a node in that cluster.

> The bolt option has the following issue: https://github.com/SteelBridgeLabs/neo4j-gremlin-bolt/issues/83

i think that's just missing documentation and not an actual problem with using it with Gremlin Server. I've heard from folks over the years who have gotten it working. Gremlin Server can host any TinkerPop-enabled graph that supports our GraphFactory instantiation and given that they have marked their Graph implementation file with our GraphFactoryClass annotation I would imagine it should work. You just create a properties file (the same way we do for all graph instance configuration in Gremlin Server) and ensure that you have:

gremlin.graph=com.steelbridgelabs.oss.neo4j.structure.Neo4JGraph

and the rest of the file has configuration options specific to neo4j-gremlin-bolt which seem to be found here:


You would also have to be sure that you do a "bin/gremlin-server.sh install" of the neo4j-gremlin-bolt artifact so that Gremlin Server has the necessary dependencies. Can't think of any other steps to take....pretty sure that's it.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Wolfgang Fahl

unread,
Oct 1, 2019, 2:08:38 AM10/1/19
to Gremlin-users
My question was:


Where does the "embedded" mode keep it's data and why is it no accessible from the webbrowser?

And I am still not happy with the state of affairs.
I'd appreciate an explanation on how the embedded mode works. If you look in the tutorial at http://wiki.bitplan.com/index.php/Gremlin_python#Connecting_to_Gremlin_enabled_graph_databases the neo4j section
starts a docker based neo4j. Given Stephen Malette's answer it feels as if this is unnecessary since the neo4j plugin will start its own instance of neo4j. But how do i then change the username and password for that instance?
Currently python+ neo4j does not run at all any more in my environment. If I do a
./run -n
./run -t

i get E   gremlin_python.driver.protocol.GremlinServerError: 401: Username and/or password are incorrect

For the docker version i would understand it because I am immediately asked to change the default username and password. But even if i change the server.yaml file that is used by the tutorial for setting username/password this does not change anything.

I hate that there is no API for asking the server connection meta data (simple things like version - hostname, type of provider) so it is almost impossible to debug things properly.

Wolfgang Fahl

unread,
Oct 1, 2019, 5:40:13 AM10/1/19
to Gremlin-users



Where does the "embedded" mode keep it's data and why is it no accessible from the webbrowser?

I checked my assumption and indeed the tests run without starting the docker based neo4j.
The username/password problem is platform dependent. The problem shows up on one of my Macs but not the other. 
My next experiment was to run a test using the
./run -n


option to start the gremlin-server with embedded neo4 support - load some data and then kill that server.
Then running the docker based neo4j with browser support i'd expect to access the data (there should be no locking problem).
No - it's not possible to visualize the actions of the gremlin based code this way. I'll now do another try with properly shutting down the server. I'd love to see working examples instead of having to go the trial/error route for days ...

Wolfgang Fahl

unread,
Oct 1, 2019, 6:18:16 AM10/1/19
to Gremlin-users


Where does the "embedded" mode keep it's data and why is it no accessible from the webbrowser?

The next trial was to change
-      $gsd/bin/gremlin-server.sh $conf
+      export GREMLIN_YAML=$conf
+      $gsd/bin/gremlin-server.sh start


so that
./run -n
would change the server in background.

i can then properly shutdown the server with
apache-tinkerpop-gremlin-server-3.4.3/bin/gremlin-server.sh  stop


check the status with
apache-tinkerpop-gremlin-server-3.4.3/bin/gremlin-server.sh  status



and try to debug the behavior with
tail -f apache-tinkerpop-gremlin-server-3.4.3/logs/gremlin.log

My expectation would be that i should be able to either:
- start the neo4j docker script with uses the directory /tmp/neo4j
- add some graph data there
- stop the neo4j docker script
- start the neo4j gremlin server with e.g. ./run -n
- access the the graph data added from the docker session via python

or
- start the neo4j gremlin server with e.g. ./run -n
- add some graph data there e.g. loading the modern graph via python
- stop the neo4j gremlin server
- start the neo4j docker script with uses the directory /tmp/neo4j
- access the graph data via browser

The locking problem mentioned by Stephen Mallette should not prevent this.

My trials where all unsuccessful. Why is that so? In the past I have been using neo4j from java and all my results e.g. from the SimpleGraph project where persisted.

Stephen Mallette

unread,
Oct 2, 2019, 10:34:38 AM10/2/19
to gremli...@googlegroups.com
I can't say that I know Docker terribly well and have never used neo4j with docker so I'm doubly unable to help on this one. Maybe someone with more docker knowledge can help.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Josh Perryman

unread,
Oct 3, 2019, 11:19:56 PM10/3/19
to Gremlin-users
Going back to the "federated" aspect of TinkerPop, I'm not sure that there's many on this list with much familiarity with Neo4j. I believe that Neo4j falls into a "Third Party" category and that that organization, or others more closely associated with it, provide the TinkerPop integration tools. There's a rather extensive list of these providers here: http://tinkerpop.apache.org/providers.html.  I think that they all fall into this category. 

I have dabbled some with Docker and TinkerPop, in conjunction with building Gremlin training programs. Sometimes I have found that Docker causes as many problems as it solves, especially when it comes to managing the networking or dealing with high-IO images. But you might find this image helpful:  https://github.com/experoinc/gremlin-lang-intro.  I believe that the README and Dockerfile are clear enough, though perhaps too wordy.  

It sounds like you might be using Docker's volume capability to attempt to share state between the container and the OS.  I have dabbled with this a little on a MacOS (a somewhat well-provisioned MacBook Pro).  For my experience, the containers worked ok with ActiveMQ and ElasticSearch (because our use of those engines is very light), but when I tried to do the same with PostgreSQL and DSE Graph, attempting to map all data persistence directly to the underlying SSD, the containers could not handle the GBs of data I was trying to run through them.  For both PostgreSQL and DSE Graph I had to switch to some sort of binary run directly by the OS.  I did find that the generic Docker on Mac doesn't handle high-IO situations well on MacOS, though there's some command-line Docker approach which might work. Since the binaries work well enough, I've not had the need (or the leisure) to try anything else. 

That might have nothing to do with the challenges you're facing in your setup. But as your approach seems unique, it is going to be difficult for any on this list to explain why it doesn't work without any error statement or more detailed description of the failure state.  

For the training I'm planning in January, where I do plan to support the Python GLV, I'm planning to use a TinkerPop Gremlin Server binary and we will build a Python script which will interact with it through a network connection, or the "Python Remote Connection" approach.  It's my expectation that most folks will interact with their TinkerPop-compatible server in this type of client-server approach. 

-Josh
To unsubscribe from this group and stop receiving emails from it, send an email to gremli...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages