"local class incompatible" when using SparkGraphComputer + BulkLoaderVertexProgram to load GraphSON

230 views
Skip to first unread message

Edi Bice

unread,
Jun 15, 2015, 2:27:17 PM6/15/15
to aureliu...@googlegroups.com
I am trying to migrate from Titan+Cassandra+ES 0.4.4 to the 0.9M2 release of the same stack

gremlin> Gremlin.version()
==>3.0.0.M9-incubating

gremlin> graph = GraphFactory.open('conf/hadoop-graph/load-twitter-prod.properties')

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=/d1/twitter_prod.gson
gremlin.hadoop.outputLocation=output
titanmr.ioformat.conf.storage.backend=cassandrathrift
titanmr.ioformat.conf.storage.hostname=10.xx.xx.xx,10.xx.xx.xx,10.xx.xx.xx
titanmr.bulkload.conf.storage.backend=cassandrathrift
titanmr.bulkload.conf.storage.hostname=10.xx.xx.xx,10.xx.xx.xx,10.xx.xx.xx
spark.master=spark://XXX.YYY.local:7077
spark.executor.memory=1g
#spark.serializer=org.apache.spark.serializer.KryoSerializer
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

gremlin> r = graph.compute(SparkGraphComputer).program(BulkLoaderVertexProgram.build().titan('conf/titan-cassandra-es.properties').create()).submit().get()

java.io.InvalidClassException: org.apache.spark.Aggregator; local class incompatible: stream classdesc serialVersionUID = -9085606473104903453, local class serialVersionUID = 5032037208639381169

I have read the Tinkerpop doc about data migration (the incompatbility etc which may be the cause of the error above) and specifically about using the LegacyGraphSONReader but can't figure out yet how to connect to the BulkLoaderVertexProgram etc


If I use the method described in the doc above to read a graph, how do I pass the rest of the properties which GraphFactory pulls in from properties file?

gremlin> r = LegacyGraphSONReader.build().create() ==>org.apache.tinkerpop.gremlin.structure.io.graphson.LegacyGraphSONReader@64337702 gremlin> r.readGraph(new FileInputStream('/tmp/tp2.json'), graph)

Stephen Mallette

unread,
Jun 15, 2015, 6:07:13 PM6/15/15
to aureliu...@googlegroups.com
I'm not sure I follow your question exactly.  Please keep in mind that using LegacyGraphSONReader and Spark together isn't possible.  LegacyGraphSONReader is meant to be useful for smallish/simple data migrations where using spark probably wouldn't make much sense in the first place.  

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/e89c7a61-1f98-4107-a343-030b442f106e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Edi Bice

unread,
Jun 16, 2015, 9:14:58 AM6/16/15
to aureliu...@googlegroups.com
Thanks for confirming my inability to combine LegacyGraphSONReader and Spark.

Can I load a Titan 0.4.4 GraphSON graph into Titan 0.9 via Spark? If so, what do you make of the "local class incompatible" error? I picked up on a gremlin-users thread where such error was mentioned and the takeaway was the incompatibility between TP2 and TP3 GraphSON (with the recommendation being exporting in EXTENDED mode and reading via LegacyGraphSONReader). Still trying to figure out how TP3 works in Titan 0.9.

When working with 0.4.4 I installed Hadoop1 on each Titan instance so the Titan+Cassandra+ES cluster was also a Hadoop1 cluster. Now with 0.9 I simply pointed the bulk load job to a remote Spark cluster master (CDH5.3.4 Hadoop2 Spark 1.2.1). I'm thinking the "local class incompatible" error might be due to this arrangement. Should the Titan+Gremlin instance where I launch the job also belong to the Spark cluster?

Edi Bice

unread,
Jun 17, 2015, 10:49:13 AM6/17/15
to aureliu...@googlegroups.com
I proceeded to install Hadoop 1.2.1 and Spark1.4_Hadoop1 in the Titan+Cassandra+ES cluster. Similar to Roy Levin, I can't figure out how to configure the bulk load job to read from a file in HDFS - keep getting "java.lang.IllegalArgumentException: Wrong FS: hdfs://sv-devtitan01/user/hduser/twitter_prod.gson.bz2, expected: file:///". No matter whether I specify the full HDFS URL or just the file name - while running as hduser. Then went back to specifying a local filesystem full path. It seems to be OK with that.

Now I'm back to the same "local class incompatible" error.

Gremlin console:

10:38:13 WARN  akka.remote.ReliableDeliverySupervisor  - Association with remote system [akka.tcp://sparkMaster@sv-devtitan01:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
10:38:33 ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend  - Application has been killed. Reason: All masters are unresponsive! Giving up.

Spark logs:

15/06/17 10:38:13 ERROR Remoting: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = -7685200927816255400
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 7674242335164700840, local class serialVersionUID = -7685200927816255400
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)

Edi Bice

unread,
Jun 17, 2015, 1:07:59 PM6/17/15
to aureliu...@googlegroups.com
For the sake of documenting solutions - the cause turned out to be Spark 1.4. Looked in Titan lib and discovered the correct version to use is 1.2.1. Submitting against that does not produce any "local class incompatible" errors. 
Reply all
Reply to author
Forward
0 new messages