Janusgraph with YARN and HBASE

70 views
Skip to first unread message

Fábio Dapper

unread,
Jul 22, 2020, 6:57:38 PM7/22/20
to JanusGraph users
Hello, we have a Cluster with CLOUDERA CDH 6.3.2 and I'm trying to run Janusgraph on the Cluster with YARN and HBASE, but without success.
(it's OK with SPARK Local)

Version SPARK 2.4.2
HBASE: 2.1.0-cdh6.3.2
Janusgraph (v 0.5.2 and v0.4.1)

I did a lot of searching, but I didn't find any recent references, and they all use older versions of SPARK and Janusgraph.

Some examples:

According to these references, I followed the following steps:

  1. Copy the following files to the Janusgraph "lib" directory:
    1. spark-yarn-2.11-2.4.0.jar
    2. scala-reflect-2.10.5.jar
    3. hadoop-yarn-server-web-proxy-2.7.2.jar
    4. guice-servlet-3.0.jar
  2. Generate a "/tmp/spark-gremlin-0.5.2.zip" file containing all the .jar files from "janusgraph / lib /".
  3. Create a configuration file called 'test.properties' from conf/hadoop-graph/read-hbase-standalone-cluster.properties by adding (or modifying) the properties below:

        janusgraphmr.ioformat.conf.storage.hostname=XXX.XXX.XXX.XXX 
spark.master= yarn
#spark.deploy-mode=client
spark.submit.deployMode=client
spark.executor.memory=1g
spark.yarn.dist.jars=/tmp/spark-gremlin-0-5-2.zip

spark.yarn.archive=/tmp/spark-gremlin-0-5-2.zip
spark.yarn.appMasterEnv.CLASSPATH=./__spark_libs__/*:[hadoop_conf_dir]
spark.executor.extraClassPath=./__spark_libs__/*:/[hadoop_conf_dir]
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native 



Then I ran the following commands:
    graph = GraphFactory.open(conf/hadoop-graph/test.properties)
    g
    = graph.traversal().withComputer(SparkGraphComputer)
    g
    .V().count()
Can someone help me?
a) Are these problems related to version incompatibility?
b) Has anyone successfully used similar infrastructure?
c) Would anyone know how to determine a correct version of the necessary libraries?
d) Any suggestion?


Thank you all !!!

 Below is a copy of the Yarn Log from my last attempt.

ERROR org.apache.spark.scheduler.TaskSetManager  - Task 0 in stage 0.0 failed 4 times; aborting job
org
.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, [SERVER_NAME], executor 1): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
at org
.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304)
at scala
.Option.map(Option.scala:146)
at org
.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235)
at scala
.Option.getOrElse(Option.scala:121)
at org
.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
at org
.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at org
.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org
.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org
.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org
.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org
.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org
.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89)
at org
.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org
.apache.spark.scheduler.Task.run(Task.scala:121)
at org
.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org
.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org
.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java
.lang.Thread.run(Thread.java:748)

Thank you!!

Petr Stentor

unread,
Jul 23, 2020, 4:20:12 AM7/23/20
to JanusGraph users

Hi!

Try this 
spark.io.compression.codec=snappy

четверг, 23 июля 2020 г., 1:57:38 UTC+3 пользователь Fábio Dapper написал:

Fábio Dapper

unread,
Jul 23, 2020, 9:19:46 AM7/23/20
to janusgra...@googlegroups.com
Perfect!!!
That's it!
Thank you, very much!!!

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/467a21c7-b103-4c1a-9404-a514e4335671o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages