Unable to Create Neo4JGraph in TinkerPop3

209 views
Skip to first unread message

J.D.

unread,
May 11, 2016, 1:14:33 AM5/11/16
to Gremlin-users
I am attempting to create a new Neo4JGraph in TinkerPop3 and unable to do so for a variety of different reasons depending upon which versions of the neo4j-gremlin plugin I install.

I have titan-1.0.0-hadoop1 installed which has gremlin version 3.0.1-incubating. 

I tried to install neo4j-gremlin using the following command in the console,

 :install org.apache.tinkerpop neo4j-gremlin 3.0.1-incubating

==>Loaded: [org.apache.tinkerpop, neo4j-gremlin, 3.0.1-incubating] - restart the console to use [tinkerpop.neo4j]

gremlin> graph = Neo4jGraph.open('/tmp/neo4j');

No such property: Neo4jGraph for class: groovysh_evaluate


I figured that the property Neo4jGraph was part of a newer version of the neo4j-gremlin plugin (e.g., 3.1.0-incubating).  I did not pursue this further.


Next, I tried to install (and activate) the 3.1.0-incubating version of the neo4j-gremlin plugin and it loads just fine, but when I try to create a graph

using the command:


graph = Neo4jGraph.open('/tmp/neo4j')


I am presented with a stack trace as follows: (important details only)


...

Caused by: java.lang.VerifyError: class org.neo4j.index.impl.lucene.LuceneDataSource$1 overrides final method tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:760)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)

at java.net.URLClassLoader.access$100(URLClassLoader.java:73)

at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

at java.net.URLClassLoader$1.run(URLClassLoader.java:362)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:361)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at groovy.lang.GroovyClassLoader.loadClass(GroovyClassLoader.java:676)

at groovy.lang.GroovyClassLoader.loadClass(GroovyClassLoader.java:786)

at groovy.lang.GroovyClassLoader.loadClass(GroovyClassLoader.java:774)

at org.neo4j.index.lucene.LuceneKernelExtension.init(LuceneKernelExtension.java:52)

at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:483)


I suspect its a version issue with Lucene, but not sure which versions of the neo4j plugin is applicable to which version of gremlin.


Thoughts,





Daniel Kuppitz

unread,
May 11, 2016, 1:30:44 AM5/11/16
to gremli...@googlegroups.com
Titan and Neo4j have conflicting versions of Lucene. You'll need to either
  • delete Titan's lucene jars and use Neo4j or
  • delete the Neo4j plugin and use Titan
Better install the latest TinkerPop distribution in a different directory and install the Neo4j plugin there in order to prevent those version conflicts.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/2f07d941-80a4-4c5f-a2ee-09c29c3b0a1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Mallette

unread,
May 11, 2016, 6:28:25 AM5/11/16
to Gremlin-users
Expanding on Daniel's answer, the plugin system isn't terribly smart about dependencies. Install enough of them with and you'll find inevitable conflict that can only be resolved in a manual fashion unfortunately. If anyone has ideas for better managing that sort of thing, i'd welcome some discussion.

J.D.

unread,
May 11, 2016, 11:07:36 AM5/11/16
to Gremlin-users
I will try resolving by deleting the lucene jars in titan, but am leaning towards installing TinkerPop stand-alone with the Neo4j plugin as you suggested.  I thought I had given that a try as well and had some other issues with that use case.
Thanks,
J.D. 

J.D.

unread,
May 11, 2016, 1:39:42 PM5/11/16
to Gremlin-users
Hi Stephen,

Is it possible to build a TinkerPop3 distribution with Neo4J plugin?  If so, are there instructions for doing so?

Here is how I think it should be done, but not sure,

git clone https://github.com/apache/incubator-tinkerpop.git

cd incubator-tinkerpop

mvn clean install -DskipTests -DincludeNeo4j


I actually tried this and the build completed successfully, but I expected when I started gremlin-console that the Neo4J Plugin would be installed already.

Since the neo4J plugin wasn't installed as part of the build, I tried running :install against the snapshot version of neo4j that is in my local .m2 repo and got the following,


// executed in gremlin console

:install org.apache.tinkerpop neo4j-gremlin 3.2.1-SNAPSHOT


// console output --

WARN  org.apache.tinkerpop.gremlin.groovy.util.DependencyGrabber  - Detected a non-standard Gremlin directory structure during install.  Expecting a 'lib' directory sibling to 'ext'. This message does not necessarily imply failure, however the console requires a certain directory structure for proper execution. Altering that structure can lead to unexpected behavior.

==>java.lang.RuntimeException: Error grabbing Grapes -- [download failed: org.apache.commons#commons-lang3;3.3.2!commons-lang3.jar]


I believe I have my grapeConfig.xml configured correctly but maybe I am missing something?


J.D.

Stephen Mallette

unread,
May 11, 2016, 1:48:04 PM5/11/16
to Gremlin-users
> Is it possible to build a TinkerPop3 distribution with Neo4J plugin? 

We can't build a convenience distribution because Neo4j is GPL'd and that conflicts with our Apache license. Users have to knowingly "install" Neo4j themselves. So you can build the latest yourself....

> Here is how I think it should be done, but not sure,

....which looks right.

> but I expected when I started gremlin-console that the Neo4J Plugin would be installed already.

It won't be - that "install" just installs the artifacts to your local .m2 directory (not to the console for usage)....

>  I tried running :install against the snapshot version of neo4j that is in my local .m2 repo

....which you seem to have noticed and "yes" you are on the right track to then use :install in the Gremlin Console to install the newly built SNAPSHOT. my guess is that you are running the console out of your source directory which never seems to work quite right in my opinion. if you instead do:

cd gremlin-console/target/*stand*

and then run bin/gremlin.sh i would expect the error to go away about the "ext" directory. Not sure why you would have trouble with the apache commons dependency. That usually is a grapeConfig.xml problem.

Daniel Kuppitz

unread,
May 11, 2016, 1:51:44 PM5/11/16
to gremli...@googlegroups.com
cd incubator-tinkerpop
mvn clean install -DskipTests -DincludeNeo4j

The option -DincludeNeo4j only activates the Neo4j test suite. You don't need that, since you skip all tests anyway.

Cheers,
Daniel


J.D.

unread,
May 11, 2016, 1:54:43 PM5/11/16
to Gremlin-users
Thanks Dan, I wondered why the neo4j tests were the only ones running :)

J.D.

unread,
May 11, 2016, 1:59:03 PM5/11/16
to Gremlin-users
First of all thanks for your quick response,

Your suggestion to run from the standalone folder solved the warning message.  Still getting the commons-lang dependency issue but pretty sure that is an issue with my grape config.

Cheers,
J.D.

p.s. I see a lot of code authored by you at P*son.

Stephen Mallette

unread,
May 11, 2016, 2:27:15 PM5/11/16
to Gremlin-users
sometimes putting your .m2 directory last or first in the <ibilio> references "lets stuff happen". 

J.D.

unread,
May 11, 2016, 6:45:28 PM5/11/16
to Gremlin-users
I was able to resolve the dependency issue when trying to install the neo4j plugin.  I re-oriented the location of the local repo reference in the grape config and blew away my local repository and rebuilt everything.  Once I did that the plugin installed perfectly and was able to create Neo4J Graphs in Gremlin.

J

Stephen Mallette

unread,
May 12, 2016, 6:34:48 AM5/12/16
to Gremlin-users
cool - grape is finicky sometimes unfortunately. btw, nice to see Gremlin still doing some work in Pearson.

J.D.

unread,
May 16, 2016, 1:54:20 PM5/16/16
to Gremlin-users
Hi Stephen,

I would like to export a fairly large Titan Graph (>1TB) into GraphSON.  I created an EC2 instance with a large data volume where I will attempt the export.  My question is whether its enough to install gremlin on the server or do I need the whole titan-1.0 package installed on the server to connect to the cluster in order to export the graph?

Stephen Mallette

unread,
May 17, 2016, 7:04:34 AM5/17/16
to Gremlin-users
You will need the Titan distribution.

Daniel Kuppitz

unread,
May 17, 2016, 7:24:54 AM5/17/16
to gremli...@googlegroups.com
Hi J.D.,

I don't think you can use the Gremlin I/O utils to export a graph of this size. Hence, you'll either need Hadoop/Giraph or Spark to run the BulkDumperVertexProgram.
That said, it should be sufficient if this EC2 instance is going to be a single Hadoop or Spark node that is used for the export. The Titan distribution won't be required as you can trigger the export from any other machine.

The only problem that I see coming is this: Titan 1.0.0's TinkerPop version doesn't have the BulkDumperVertexProgram included. I'm not sure if there was a technical reason or if it just wasn't ready at the time we released TinkerPop 3.0.1. Anyway, I would suggest to simply build a custom jar that depends on TinkerPop 3.0.1 and has the BDVP code included, distribute it across the cluster and just give it a try. Looking at the code I don't see a reason why it wouldn't work with 3.0.1.

Cheers,
Daniel


J.D.

unread,
May 17, 2016, 11:44:32 AM5/17/16
to Gremlin-users
Hi Dan,

So, what branch do I need to pull for TinkerPop 3.0.1?  And is there a way to build the tinkerpop 3.0.1 distribution, specifying the inclusion of the BDVP code using maven?

J

Daniel Kuppitz

unread,
May 17, 2016, 12:18:20 PM5/17/16
to gremli...@googlegroups.com
Ah, yea, you can certainly do a full rebuild, that's probably easier than creating a new independent jar and integrate that as a custom extension. Checkout the 3.0.1-incubating tag, add the BulkDumperVertexProgram and just rebuild it using mvn clean install. To integrate BulkDumperVertexProgram I would try to cherry-pick this single commit:


Cheers,
Daniel


J.D.

unread,
May 17, 2016, 12:52:32 PM5/17/16
to Gremlin-users
Hi Dan,

I was able to successfully perform a full rebuild with the additional BDVP code.  I have this new distribution on an EC2 node that is not part of the cluster.  I assume I can still connect to the cluster and execute the dump of the graph from a EC2 instance that isn't part of the cluster?

Thanks,
J.D.

Daniel Kuppitz

unread,
May 17, 2016, 2:57:51 PM5/17/16
to gremli...@googlegroups.com
Yes, that should work. Just be sure to export HADOOP_GREMLIN_LIBS and set gremlin.hadoop.jarsInDistributedCache=true in your Gremlin-Hadoop configuration file, this will ensure that the instance, that initiates the job, distributes its jar files (with BDVP included) and once that's done, Spark will load those jars and run the vertex program.

Cheers,
Daniel


J.D.

unread,
May 17, 2016, 5:06:11 PM5/17/16
to Gremlin-users
Do we need to have an Analytics node up and running in our titan C* cluster in order to execute this strategy for using the BDVP?  I don't believe any of  the nodes in our cluster are "analytic" nodes (e.g., hadoop enabled).

J.D.

Daniel Kuppitz

unread,
May 17, 2016, 6:46:11 PM5/17/16
to gremli...@googlegroups.com
Well, without Analytics nodes (nodes with either Spark and/or Hadoop running) you could still use the local Spark job runner (spark.master=local[*]). In that case the node that initiates the job would also be the node that stores the output.

Cheers,
Daniel


Reply all
Reply to author
Forward
0 new messages