Error when running JanusGraph with YARN and CQL

85 views
Skip to first unread message

Varun Ganesh

unread,
Dec 9, 2020, 11:49:29 PM12/9/20
to JanusGraph users
Hello,

I am trying to run SparkGraphComputer on a JanusGraph backed by Cassandra and ElasticSearch. I have previously verified that I am able to run SparkGraphComputer on a local Spark standalone cluster.

I am now trying to run it on YARN. I have a local YARN cluster running and I have verified that it can run Spark jobs.

I followed the following links:

And here is my read-cql-yarn.properties file:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true

#
# JanusGraph Cassandra InputFormat configuration
#
# These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cql
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.port=9042
# This specifies the keyspace where data is stored.
janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph
# This defines the indexing backend configuration used while writing data to JanusGraph.
janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
janusgraphmr.ioformat.conf.index.search.hostname=127.0.0.1
# Use the appropriate properties for the backend when using a different storage backend (HBase) or indexing backend (Solr).

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.widerows=true

#
# SparkGraphComputer Configuration
#
spark.master=yarn
spark.submit.deployMode=client
spark.executor.memory=1g

spark.yarn.dist.archives=/tmp/spark-gremlin.zip
spark.yarn.dist.files=/Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar
spark.yarn.appMasterEnv.CLASSPATH=/Users/my_comp/Downloads/hadoop-2.7.2/etc/hadoop:./spark-gremlin.zip/*
spark.executor.extraClassPath=/Users/my_comp/Downloads/hadoop-2.7.2/etc/hadoop:/Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar:./spark-gremlin.zip/*

spark.driver.extraLibraryPath=/Users/my_comp/Downloads/hadoop-2.7.2/lib/native:/Users/my_comp/Downloads/hadoop-2.7.2/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/Users/my_comp/Downloads/hadoop-2.7.2/lib/native:/Users/my_comp/Downloads/hadoop-2.7.2/lib/native/Linux-amd64-64

spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

After a bunch of trial and error, I was able to get it to a point where I see containers starting up on my YARN Resource manager UI (port 8088)

Here is the code I am running (it's a simple count):
gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-cql-yarn.properties')
==>hadoopgraph[cqlinputformat->nulloutputformat]
gremlin> g = graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().count()

However I am encountering the following failure:
18:49:03 ERROR org.apache.spark.scheduler.TaskSetManager - Task 2 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 10, 192.168.1.160, executor 1): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2862)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1682)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2366)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2290)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2148)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1647)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:483)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:441)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:370)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Would really appricate it if someone could shed some light on this error and advise on next steps!

Thank you!

Varun Ganesh

unread,
Dec 10, 2020, 1:00:24 PM12/10/20
to JanusGraph users
An update on this, I tried setting the env var below:

export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/lib

After doing this I was able to successfully run the tinkerpop-modern.kryo example from the Recipes documentation
(though the guide at http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html explicitly asks us to ignore this)

Unfortunately, it is still not working with CQL. But the error is now different. Please see below:

12:46:33 ERROR org.apache.spark.scheduler.TaskSetManager - Task 3 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 9, 192.168.1.160, executor 2): java.lang.NoClassDefFoundError: org/janusgraph/hadoop/formats/util/HadoopInputFormat
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
... (skipping)
Caused by: java.lang.ClassNotFoundException: org.janusgraph.hadoop.formats.util.HadoopInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 130 more

Is there some additional dependency that I may need to add?

Thanks in advance!
Message has been deleted

Varun Ganesh

unread,
Dec 10, 2020, 2:23:32 PM12/10/20
to JanusGraph users
Answering my own question. I was able fix the above error and successfully run the count job after explicitly adding /Users/my_comp/Downloads/janusgraph-0.5.2/lib/* to spark.executor.extraClassPath

But I am not yet sure as to why that was needed. I had assumed that adding spark-gremlin.zip to the path would have provided the required dependencies.

HadoopMarc

unread,
Dec 11, 2020, 2:05:35 AM12/11/20
to JanusGraph users
Hi Varun,

Good job. However, your last solution will only work with everything running on a single machine. So, indeed, there is something wrong with the contents of spark-gremlin.zip or with the way it is put in the executor's local working directory. Note that you already put /Users/my_comp/Downloads/janusgraph-0.5.2/lib/janusgraph-cql-0.5.2.jar explicitly on the executor classpath while it should have been available already through ./spark-gremlin.zip/*

O, I think I see now what is different. You have used spark.yarn.dist.archives, while the TinkerPop recipes use spark.yarn.archive. They behave differently in yes/no extracting the jars from the zip. I guess either can be used, provided it is done consistently. You can use the environment tab in Spark web UI to inspect how things are picked up by spark.

Best wishes,    Marc

Op donderdag 10 december 2020 om 20:23:32 UTC+1 schreef Varun Ganesh:

Varun Ganesh

unread,
Dec 11, 2020, 10:33:34 AM12/11/20
to JanusGraph users
Thanks a lot for responding Marc.

Yes, I had initially tried setting spark.yarn.archive with the path to spark-gremlin.zip. However with this approach, the containers were failing with the message "Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher".

I'm yet to understand the differences between the spark.yarn.archive and the HADOOP_GREMLIN_LIBS approaches. Will update this thread as I find out more.

Thank you,
Varun

Reply all
Reply to author
Forward
0 new messages