Janusgraph with YARN and HBASE

Fábio Dapper

unread,

Jul 22, 2020, 6:57:38 PM7/22/20

to JanusGraph users

Hello, we have a Cluster with CLOUDERA CDH 6.3.2 and I'm trying to run Janusgraph on the Cluster with YARN and HBASE, but without success.

(it's OK with SPARK Local)

Version SPARK 2.4.2

HBASE: 2.1.0-cdh6.3.2

Janusgraph (v 0.5.2 and v0.4.1)

I did a lot of searching, but I didn't find any recent references, and they all use older versions of SPARK and Janusgraph.

Some examples:

1) https://docs.janusgraph.org/advanced-topics/hadoop/

2) http://tinkerpop.apache.org/docs/current/recipes/#olap-spark-yarn

3) http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html

According to these references, I followed the following steps:

Copy the following files to the Janusgraph "lib" directory:

spark-yarn-2.11-2.4.0.jar
scala-reflect-2.10.5.jar
hadoop-yarn-server-web-proxy-2.7.2.jar
guice-servlet-3.0.jar

Generate a "/tmp/spark-gremlin-0.5.2.zip" file containing all the .jar files from "janusgraph / lib /".
Create a configuration file called 'test.properties' from “conf/hadoop-graph/read-hbase-standalone-cluster.properties” by adding (or modifying) the properties below:

        janusgraphmr.ioformat.conf.storage.hostname=XXX.XXX.XXX.XXX 
	spark.master= yarn
	#spark.deploy-mode=client
	spark.submit.deployMode=client
	spark.executor.memory=1g
	spark.yarn.dist.jars=/tmp/spark-gremlin-0-5-2.zip

	spark.yarn.archive=/tmp/spark-gremlin-0-5-2.zip
	spark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[hadoop_conf_dir]
	spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_conf_dir]
	spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

Then I ran the following commands:

graph = GraphFactory.open(conf/hadoop-graph/test.properties)
g = graph.traversal().withComputer(SparkGraphComputer) 
g.V().count()

Can someone help me?

a) Are these problems related to version incompatibility?

b) Has anyone successfully used similar infrastructure?

c) Would anyone know how to determine a correct version of the necessary libraries?

d) Any suggestion?

Thank you all !!!

Below is a copy of the Yarn Log from my last attempt.

ERROR org.apache.spark.scheduler.TaskSetManager  - Task 0 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, [SERVER_NAME], executor 1): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala.Option.map(Option.scala:146)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Thank you!!

Petr Stentor

unread,

Jul 23, 2020, 4:20:12 AM7/23/20

to JanusGraph users

Hi!

Try this

spark.io.compression.codec=snappy

четверг, 23 июля 2020 г., 1:57:38 UTC+3 пользователь Fábio Dapper написал:

Fábio Dapper

unread,

Jul 23, 2020, 9:19:46 AM7/23/20

to janusgra...@googlegroups.com

Perfect!!!

That's it!

Thank you, very much!!!

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/467a21c7-b103-4c1a-9404-a514e4335671o%40googlegroups.com.

Reply all

Reply to author

Forward