Error when running JanusGraph with YARN and CQL

Varun Ganesh

unread,

Dec 9, 2020, 11:49:29 PM12/9/20

to JanusGraph users

Hello,

I am trying to run SparkGraphComputer on a JanusGraph backed by Cassandra and ElasticSearch. I have previously verified that I am able to run SparkGraphComputer on a local Spark standalone cluster.

I am now trying to run it on YARN. I have a local YARN cluster running and I have verified that it can run Spark jobs.

I followed the following links:

http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html

http://tinkerpop.apache.org/docs/3.4.6/recipes/#olap-spark-yarn

And here is my read-cql-yarn.properties file:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat

gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true

gremlin.hadoop.inputLocation=none

gremlin.hadoop.outputLocation=output

gremlin.spark.persistContext=true

#

# JanusGraph Cassandra InputFormat configuration

#

# These properties defines the connection properties which were used while write data to JanusGraph.

janusgraphmr.ioformat.conf.storage.backend=cql

# This specifies the hostname & port for Cassandra data store.

janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1

janusgraphmr.ioformat.conf.storage.port=9042

# This specifies the keyspace where data is stored.

janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph

# This defines the indexing backend configuration used while writing data to JanusGraph.

janusgraphmr.ioformat.conf.index.search.backend=elasticsearch

janusgraphmr.ioformat.conf.index.search.hostname=127.0.0.1

# Use the appropriate properties for the backend when using a different storage backend (HBase) or indexing backend (Solr).

#

# Apache Cassandra InputFormat configuration

#

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

cassandra.input.widerows=true

#

# SparkGraphComputer Configuration

#

spark.master=yarn

spark.submit.deployMode=client

spark.executor.memory=1g

spark.yarn.dist.archives=/tmp/spark-gremlin.zip

spark.yarn.dist.files=/Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar

spark.yarn.appMasterEnv.CLASSPATH=/Users/my_comp/Downloads/hadoop-2.7.2/etc/hadoop:./spark-gremlin.zip/*

spark.executor.extraClassPath=/Users/my_comp/Downloads/hadoop-2.7.2/etc/hadoop:/Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar:./spark-gremlin.zip/*

spark.driver.extraLibraryPath=/Users/my_comp/Downloads/hadoop-2.7.2/lib/native:/Users/my_comp/Downloads/hadoop-2.7.2/lib/native/Linux-amd64-64

spark.executor.extraLibraryPath=/Users/my_comp/Downloads/hadoop-2.7.2/lib/native:/Users/my_comp/Downloads/hadoop-2.7.2/lib/native/Linux-amd64-64

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

After a bunch of trial and error, I was able to get it to a point where I see containers starting up on my YARN Resource manager UI (port 8088)

Here is the code I am running (it's a simple count):

gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-yarn.properties')

==>hadoopgraph[cqlinputformat->nulloutputformat]

gremlin> g = graph.traversal().withComputer(SparkGraphComputer)

==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]

gremlin> g.V().count()

However I am encountering the following failure:

18:49:03 ERROR org.apache.spark.scheduler.TaskSetManager - Task 2 in stage 0.0 failed 4 times; aborting job

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 10, 192.168.1.160, executor 1): java.lang.IllegalStateException: unread block data

at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2862)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1682)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2366)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2290)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2148)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1647)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:483)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:441)

at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)

at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:370)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Would really appricate it if someone could shed some light on this error and advise on next steps!

Thank you!

Varun Ganesh

unread,

Dec 10, 2020, 1:00:24 PM12/10/20

to JanusGraph users

An update on this, I tried setting the env var below:

export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/lib

After doing this I was able to successfully run the tinkerpop-modern.kryo example from the Recipes documentation.

(though the guide at http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html explicitly asks us to ignore this)

Unfortunately, it is still not working with CQL. But the error is now different. Please see below:

12:46:33 ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 9, 192.168.1.160, executor 2): java.lang.NoClassDefFoundError: org/janusgraph/hadoop/formats/util/HadoopInputFormat

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:756)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)

at java.net.URLClassLoader.access$100(URLClassLoader.java:74)

at java.net.URLClassLoader$1.run(URLClassLoader.java:369)

at java.net.URLClassLoader$1.run(URLClassLoader.java:363)

at java.security.AccessController.doPrivileged(Native Method)

... (skipping)

Caused by: java.lang.ClassNotFoundException: org.janusgraph.hadoop.formats.util.HadoopInputFormat

at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

at java.lang.ClassLoader.loadClass(ClassLoader.java:418)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)

at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

... 130 more

Is there some additional dependency that I may need to add?

Thanks in advance!

Message has been deleted

Varun Ganesh

unread,

Dec 10, 2020, 2:23:32 PM12/10/20

to JanusGraph users

Answering my own question. I was able fix the above error and successfully run the count job after explicitly adding /Users/my_comp/Downloads/janusgraph-0.5.2/lib/* to spark.executor.extraClassPath

But I am not yet sure as to why that was needed. I had assumed that adding spark-gremlin.zip to the path would have provided the required dependencies.

HadoopMarc

unread,

Dec 11, 2020, 2:05:35 AM12/11/20

to JanusGraph users

Hi Varun,

Good job. However, your last solution will only work with everything running on a single machine. So, indeed, there is something wrong with the contents of spark-gremlin.zip or with the way it is put in the executor's local working directory. Note that you already put /Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar explicitly on the executor classpath while it should have been available already through ./spark-gremlin.zip/*

O, I think I see now what is different. You have used spark.yarn.dist.archives, while the TinkerPop recipes use spark.yarn.archive. They behave differently in yes/no extracting the jars from the zip. I guess either can be used, provided it is done consistently. You can use the environment tab in Spark web UI to inspect how things are picked up by spark.

Best wishes, Marc

Op donderdag 10 december 2020 om 20:23:32 UTC+1 schreef Varun Ganesh:

Varun Ganesh

unread,

Dec 11, 2020, 10:33:34 AM12/11/20

to JanusGraph users

Thanks a lot for responding Marc.

Yes, I had initially tried setting spark.yarn.archive with the path to spark-gremlin.zip. However with this approach, the containers were failing with the message "Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher".

I'm yet to understand the differences between the spark.yarn.archive and the HADOOP_GREMLIN_LIBS approaches. Will update this thread as I find out more.