[BLOG] Configuring JanusGraph for spark-yarn

2,661 views
Skip to first unread message

HadoopMarc

unread,
Jul 6, 2017, 4:15:37 AM7/6/17
to JanusGraph users list

Readers wanting to run OLAP queries on a real spark-yarn cluster might want to check my recent post:

http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Regards,  Marc

Jason Plurad

unread,
Jul 6, 2017, 11:37:34 AM7/6/17
to JanusGraph users list
+1 thanks for sharing Marc!

John Helmsen

unread,
Jul 6, 2017, 7:27:59 PM7/6/17
to JanusGraph users list
Excellent!


On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:

John Helmsen

unread,
Jul 14, 2017, 6:24:25 PM7/14/17
to JanusGraph users list
HadoopMarc,

It seems that we have two graph classes that need to be created:

The first is a standardjanusgraph object that runs a standard computer. This is able to perform OLTP data pushes and, I assume, standard OLTP queries. It, however, does not interface with Spark, so SparkGraphComputer cannot be used as the graph computer for its traversal object.

The second object is a HadoopGraph object that can have SparkGraphComputer activated for its associated traversal source object. This can perform appropriate map-reduced OLAP calculations, but wouldn't be good for putting information into the HBase database.

Is this accurate, or can we create a graphtraversalsource that can perform both the OLTP data inserts and utilize SparkGraphComputer? If not, could we create both objects simultaneously? Would there be conflicts between the two if there were two simultaneous traversers?


On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:

HadoopMarc

unread,
Jul 15, 2017, 8:29:28 AM7/15/17
to JanusGraph users list
Hi John,
Your assumption about different types of graph object for OLTP and OLAP is right (at least for JanusGraph, TinkerGraph supports both). I remember examples from the gremlin user list, though, where OLTP and OLAP were mixed in the same traversal.
It is no problem to have a graph1 and a graph2 graph object simultaneously. This is also what you do in gremlin-server when you want to serve multiple graphs.

Cheers,    Marc


Op zaterdag 15 juli 2017 00:24:25 UTC+2 schreef John Helmsen:

spirit...@gmail.com

unread,
Jul 24, 2017, 4:12:13 AM7/24/17
to JanusGraph users list
hi,Thanks for your post.
I did it according to the post.But I ran into a problem.
15:58:49,110  INFO SecurityManager:58 - Changing view acls to: rc
15:58:49,110  INFO SecurityManager:58 - Changing modify acls to: rc
15:58:49,110  INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rc); users with modify permissions: Set(rc)
15:58:49,111  INFO Client:58 - Submitting application 25 to ResourceManager
15:58:49,320  INFO YarnClientImpl:274 - Submitted application application_1500608983535_0025
15:58:49,321  INFO SchedulerExtensionServices:58 - Starting Yarn extension services with app application_1500608983535_0025 and attemptId None
15:58:50,325  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:50,326  INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:51,330  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:52,333  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:53,335  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:54,337  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:55,340  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343  INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:56,802  INFO YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null)
15:58:56,822  INFO YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> dl-rc-optd-ambari-master-v-test-1.host.dataengine.com,dl-rc-optd-ambari-master-v-test-2.host.dataengine.com, PROXY_URI_BASES -> http://dl-rc-optd-ambari-master-v-test-1.host.dataengine.com:8088/proxy/application_1500608983535_0025,http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025), /proxy/application_1500608983535_0025
15:58:56,824  INFO JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15:58:57,346  INFO Client:58 - Application report for application_1500608983535_0025 (state: RUNNING)
15:58:57,347  INFO Client:58 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.200.48.154
ApplicationMaster RPC port: 0
queue: default
start time: 1500883129115
final status: UNDEFINED
user: rc
15:58:57,348  INFO YarnClientSchedulerBackend:58 - Application application_1500608983535_0025 has started running.
15:58:57,358  INFO Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47514.
15:58:57,358  INFO NettyBlockTransferService:58 - Server created on 47514
15:58:57,360  INFO BlockManagerMaster:58 - Trying to register BlockManager
15:58:57,363  INFO BlockManagerMasterEndpoint:58 - Registering block manager 10.200.48.112:47514 with 2.4 GB RAM, BlockManagerId(driver, 10.200.48.112, 47514)15:58:57,366  INFO BlockManagerMaster:58 - Registered BlockManager
15:58:57,585  INFO EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_1500608983535_0025
15:59:07,177  WARN YarnSchedulerBackend$YarnSchedulerEndpoint:70 - Container marked as failed: container_e170_1500608983535_0025_01_000002 on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e170_1500608983535_0025_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : run as user is rc
main : requested yarn user is rc


Container exited with a non-zero exit code 1
Display stack trace? [yN]15:59:57,702  WARN TransportChannelHandler:79 - Exception in connection from 10.200.48.155/10.200.48.155:50921
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
15:59:57,704 ERROR TransportResponseHandler:132 - Still have 1 requests outstanding when connection from 10.200.48.155/10.200.48.155:50921 is closed
15:59:57,706  WARN NettyRpcEndpointRef:91 - Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)

I am confused about that. Could you please help me?



在 2017年7月6日星期四 UTC+8下午4:15:37,HadoopMarc写道:

Joe Obernberger

unread,
Jul 26, 2017, 6:30:25 PM7/26/17
to HadoopMarc, JanusGraph users list

Marc - thank you for posting this.  I'm trying to get this to work with our CDH 5.10.0 distribution, but have run into an issue; but first some questions.  I'm using a 5 node cluster, and I think I do not need to set the zookeeper.zone.parent since the hbase configuration is in /etc/conf/hbase.  Is that correct?

The error that I'm getting is:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 10, host002, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2238)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2112)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2123)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)


        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Given this post:
https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser

It looks like I'm not including a necessary jar, but I'm at a loss as to which one.  Any ideas?

For reference, here is part of the config:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
#janusgraphmr.ioformat.conf.storage.hostname=fqdn1,fqdn2,fqdn3
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=TEST0.2.0
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=18
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
#zookeeper.znode.parent=/hbase-unsecure
# Security configs are needed in case of a secure cluster
#zookeeper.znode.parent=/hbase-secure
#hbase.rpc.protection=privacy
#hbase.security.authentication=kerberos

#
# SparkGraphComputer with Yarn Configuration
#

spark.master=yarn-client
spark.executor.memory=512m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*:/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64

Thank you!

-Joe

--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Virus-free. www.avg.com

Joe Obernberger

unread,
Aug 2, 2017, 3:56:33 PM8/2/17
to spirit...@gmail.com, JanusGraph users list

Could this be a networking issue?  Maybe a firewall is enabled, or selinux is preventing a connection?

I've been able to get this to work, but running a simple count - g.V().count() on anything but a very small graph takes a very very long time (hours).  Are there any cache settings, or other resources that could be modified to better the performance?

The YARN container logs are filled withe debug lines about 'Created dirty vertex map with initial size 32', 'Created vertex cache with max size 20000', and 'Generated HBase Filter ColumnRange Filter'.  Can any of these things be adjusted in the properties file?  Thank you!

-Joe

--

HadoopMarc

unread,
Aug 6, 2017, 3:50:54 PM8/6/17
to JanusGraph users list
Hi ... and others,  I have been offline for a few weeks enjoying a holiday and will start looking into your questions and make the suggested corrections. Thanks for following the recipes and helping others with it.

..., did you run the recipe on the same HDP sandbox and same Tinkerpop version? I remember (from 4 weeks ago) that copying the zookeeper.znode.parent property from the hbase configs to the janusgraph configs was essential to get janusgraph's HBaseInputFormat working (that is: read graph data for the spark tasks).

Cheers,    Marc

Op maandag 24 juli 2017 10:12:13 UTC+2 schreef spirit...@gmail.com:

Joe Obernberger

unread,
Aug 7, 2017, 5:12:02 PM8/7/17
to HadoopMarc, JanusGraph users list

Hi Marc - I've been able to get it to run longer, but am now getting a RowTooBigException from HBase.  How does JanusGraph store data in HBase?  The current max size of a row in 1GByte, which makes me think this error is covering something else up.

What I'm seeing so far in testing with a 5 server cluster - each machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and has 20,001,105 rows.  To do a g.V().count() takes 2 hours and results in 3,842,755 verticies.

Another HBase table is 5.7G in size split across 10 regions, is 57,620,276 rows, and took 6.5 hours to run the count and results in 10,859,491 nodes.  When running, it looks like it hits one server very hard even though the YARN tasks are distributed across the cluster.  One HBase node gets hammered.

The RowTooBigException is below.  Anything to try?  Thank you for any help!


org.janusgraph.core.JanusGraphException: Could not process individual retrieval call
                at org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
                at org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
                at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
                at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
                at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
                at org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
                at com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
                at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
                at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
                at org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
                at org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
                at org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
                at org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
                at org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
                at org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
                at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
                at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
                at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
                at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
                at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
                at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
                at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
                at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
                at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
                at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
                at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
                at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)


                at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
                at org.apache.spark.scheduler.Task.run(Task.scala:89)
                at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)

Caused by: org.janusgraph.core.JanusGraphException: Could not call index
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
                at org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
                ... 34 more
Caused by: org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
                at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
                at org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
                at org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
                at org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
                at org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
                at org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
                at org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
                at org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
                at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
                at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
                at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
                at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
                at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
                at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
                at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
                at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
                ... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT10S
                at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
                at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
                ... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
                at org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
                at org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
                at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
                at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
                at org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
                at org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
                at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
                ... 54 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017, RpcRetryingCaller{globalStartTime=1501932111280, pause=100, retries=35}, org.apache.hadoop.hbase.regionserver.RowTooBigException: rg.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 1073741824, but the row is bigger than that.
                at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
                at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
                at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
                at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
                at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
                at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
                at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
                at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
                at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
                at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
                at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
                at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
                at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
                at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
                at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)

--

HadoopMarc

unread,
Aug 8, 2017, 3:17:05 AM8/8/17
to JanusGraph users list, bi...@xs4all.nl
Hi Joseph,

You ran into terrain I have not yet covered myself. Up till now I have been using the graben1437 PR for Titan and for OLAP I adopted a poor man's approach where node id's are distributed over spark tasks and each spark executor makes its own Titan/HBase connection. This performs well, but does not have the nice abstraction of the HBaseInputFormat.

So, no clear answer to this one, but just some thoughts:
 - could you try to move some regions manually and see what it does to performance?
 - how do your OLAP vertex count times compare to the OLTP count times?
 - how does the sum of spark task execution times compare to the yarn start-to-end time difference you reported? In other words, how much of the start-to-end time is spent in waiting for timeouts?
 - unless you managed to create a vertex with > 1GB size, the RowTooBigException sounds like a bug (which you can report on Jnausgraph's github page). Hbase does not like large rows at all, so vertex/edge properties should not have blob values.
 
@(David Robinson): do you have any additional thoughts on this?

Cheers,    Marc

Op maandag 7 augustus 2017 23:12:02 UTC+2 schreef Joseph Obernberger:

Joe Obernberger

unread,
Aug 8, 2017, 11:28:02 AM8/8/17
to HadoopMarc, JanusGraph users list

Hi Marc - thank you very much for your reply.  I like your idea about moving regions manually and will try that.  As to OLAP vs OLTP (I assume Spark vs none), yes I have those times.
For a 1.5G table in HBase the count just using the gremlin shell without using the SparkGraphComputer:

graph = JanusGraphFactory.open('conf/graph.properties')
g=graph.traversal()
g.V().count()

takes just under 1 minute.  Using spark it takes about 2 hours.  So something isn't right.  They both return 3,842,755 vertices.   When I run it with Spark, it hits one of the region servers hard - doing over 30k requests per second for those 2 hours.

-Joe

Gariee

unread,
Aug 8, 2017, 3:45:01 PM8/8/17
to JanusGraph users list
Hi Marc, 

Request your help, I am runnign Janusgraph with maprdb as backend, I have successfuly been able to create GodofGraphs example on M7 as backend 

But when I am trying to execute the following where cluster is mapr and spark on yarn


plugin activated: tinkerpop.tinkergraph

gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')

==>hadoopgraph[gryoinputformat->nulloutputformat]

gremlin>  g = graph.traversal(computer(SparkGraphComputer))

==>graphtraversalsource[hadoopgraph[gryoinputformat->nulloutputformat], sparkgraphcomputer]

gremlin> g.V().count()


hadoop-load.properties ( I tried all different combinations as commented below each time its the same error)


#

# SparkGraphComputer Configuration

#

spark.master=yarn-client


spark.yarn.queue=cmp

mapred.job.queue.name=cmp


#spark.driver.allowMultipleContexts=true

#spark.executor.memory=4g

#spark.ui.port=20000


spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar 


#spark.executor.instances=10

#spark.executor.cores=2

#spark.executor.CoarseGrainedExecutorBackend.cores=2

#spark.executor.CoarseGrainedExecutorBackend.driver=FIXME

#spark.executor.CoarseGrainedExecutorBackend.stopping=false

#spark.streaming.stopGracefullyOnShutdown=true

#spark.yarn.driver.memoryOverhead=4g

#spark.yarn.executor.memoryOverhead=1024

#spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557


--------------------------------------------------------------------------------------------------------------

yarn log

------------------------------------------------------------------------------------------------------------

When the last command is executed, driver abruptly shuts down in  yarn container and shuts down the spark context too with following error from Yarn logs

Container: container_e27_1501284102300_47651_01_000008 on abcd.com_8039

LogType:stderr
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:2441
Log Contents:
17/07/31 14:08:05 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
17/07/31 14:08:05 INFO spark.SecurityManager: Changing view acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager: Changing modify acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs)
17/07/31 14:08:06 INFO spark.SecurityManager: Changing view acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager: Changing modify acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs)
17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/07/31 14:08:06 INFO Remoting: Starting remoting
17/07/31 14:08:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecuto...@abcd.com:36376]
17/07/31 14:08:06 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 36376.
17/07/31 14:08:06 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2
17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore started with capacity 2.7 GB
17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrain...@10.16.127.60:43768
17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
17/07/31 14:08:06 INFO executor.Executor: Starting executor ID 7 on host abcd.com
17/07/31 14:08:06 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44208.
17/07/31 14:08:06 INFO netty.NettyBlockTransferService: Server created on 44208
17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/07/31 14:08:06 INFO storage.BlockManagerMaster: Registered BlockManager
17/07/31 14:08:41 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore cleared
17/07/31 14:08:41 INFO storage.BlockManager: BlockManager stopped
17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown hook called
End of LogType:stderr

LogType:stdout
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:0
Log Contents:
End of LogType:stdout


I am stuck wit h this error , request help here.

Joe Obernberger

unread,
Aug 8, 2017, 4:38:29 PM8/8/17
to Gariee, JanusGraph users list

Could you let us know a little more about your configuration?  What is your storage backend for JanusGraph (HBase/Cassandra)?  I actually do not see an error in your log, but at the very least you'll need to defein spark.executor.extraClassPath to point to the various jars required.  Are there other logs you can look at such as the container logs for YARN?

I assume you've see Marc's blog post:
http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html

-Joe

--

Joe Obernberger

unread,
Aug 9, 2017, 1:16:23 PM8/9/17
to HadoopMarc, JanusGraph users list

Hi Marc - I did try splitting regions.  What happens when I run the SparkGraphComputer job, is that it seems to hit one region server hard, then moves onto the next; appears to run serially.  All that said, I think the big issue is that running with Spark for a count takes 2 hours and running without takes 2 minutes.  Perhaps the SparkGraphComputer is not the tool to be used for handling large graphs?  Although, I'm not sure what the purpose is then?

For reference, here is my Hadoop configuration file using Cloudera CDH 5.10.0 and JanusGraph 0.1.0:

Config:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
 

gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
 
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
 
log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn


 
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase

janusgraphmr.ioformat.conf.storage.hostname=host003:2181,host004:2181,host005:2181
janusgraphmr.ioformat.conf.storage.hbase.table=Large2
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false


# Security configs are needed in case of a secure cluster

zookeeper.znode.parent=/hbase


 
#
# SparkGraphComputer with Yarn Configuration
#

#spark.master=yarn-client
spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar,/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000
spark.cores.max=64
 
#
# Relevant configs from spark-defaults.conf
#
spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../conf:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/..:\
/opt/cloudera/parcels/CDH/lib/spark/assembly/lib/spark-assembly.jar:\
/usr/lib/jvm/java-openjdk/jre/lib/rt.jar:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hadoop/conf:\
/etc/hbase/conf:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/.//*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/.//*
spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Thanks very much for any input / further dialog!


-Joe

On 8/8/2017 3:17 AM, HadoopMarc wrote:

HadoopMarc

unread,
Aug 9, 2017, 3:33:48 PM8/9/17
to JanusGraph users list, bi...@xs4all.nl
Hi Gari and Joe,

Glad to see you testing the recipes for MapR and Cloudera respectively!  I am sure that you realized by now that getting this to work is like walking through a minefield. If you deviate from the known path, the odds for getting through are dim, and no one wants to be in your vicinity. So, if you see a need to deviate (which there may be for the hadoop distributions you use), you will need your mine sweeper, that is, put the logging level to DEBUG for relevant java packages.

This is where you deviated:
  • for Gari: you put all kinds of MapR lib folders on the applications master's classpath (other classpath configs are not visible from your post)
  • for Joe: you put all kinds of Cloudera lib folders on the executors classpath (worst of all the spark-assembly.jar)

Probably, you experience all kinds of mismatches in netty libraries which slows down or even kills all comms between the yarn containers. The philosophy of the recipes really is to only add the minimum number of conf folders and jars to the Tinkerpop/Janusgraph distribution and see from there if any libraries are missing.


At my side, it has become apparent that I should at least add to the recipes:

  • proof of work for a medium-sized graph (say 10M vertices and edges)
  • configs for the number of executors present in the OLAP job (instead of relying on spark default number of 2)

So, still some work to do!


Cheers,    Marc


Op woensdag 9 augustus 2017 19:16:23 UTC+2 schreef Joseph Obernberger:

Joe Obernberger

unread,
Aug 9, 2017, 6:13:09 PM8/9/17
to HadoopMarc, JanusGraph users list

Marc - thank you.  I've updated the classpath and removed nearly all of the CDH jars; had to keep chimera and some of the HBase libs in there.  Apart from those and all the jars in lib.zip, it is working as it did before.  The reason I turned DEBUG off was because it was producing 100+GBytes of logs.  Nearly all of which are things like:

18:04:29 DEBUG org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore - Generated HBase Filter ColumnRangeFilter [\x10\xC0, \x10\xC1)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000
18:04:29 DEBUG org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore - Generated HBase Filter ColumnRangeFilter [\x10\xC2, \x10\xC3)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created dirty vertex map with initial size 32
18:04:29 DEBUG org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache - Created vertex cache with max size 20000

Do those mean anything to you?  I've turned it back on for running with smaller graph sizes, but so far I don't see anything helpful there apart from an exception about not setting HADOOP_HOME.
Here are the spark properties; notice the nice and small extraClassPath!  :)

Name

Value

gremlin.graph

org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.deriveMemory

false

gremlin.hadoop.graphReader

org.janusgraph.hadoop.formats.hbase.HBaseInputFormat

gremlin.hadoop.graphWriter

org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.graphWriter.hasEdges

false

gremlin.hadoop.inputLocation

none

gremlin.hadoop.jarsInDistributedCache

true

gremlin.hadoop.memoryOutputFormat

org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.outputLocation

output

janusgraphmr.ioformat.conf.storage.backend

hbase

janusgraphmr.ioformat.conf.storage.hbase.region-count

5

janusgraphmr.ioformat.conf.storage.hbase.regions-per-server

5

janusgraphmr.ioformat.conf.storage.hbase.short-cf-names

false

janusgraphmr.ioformat.conf.storage.hbase.table

TEST0.2.0

janusgraphmr.ioformat.conf.storage.hostname

10.22.5.65:2181

log4j.appender.STDOUT

org.apache.log4j.ConsoleAppender

log4j.logger.deng

WARNING

log4j.rootLogger

STDOUT

org.slf4j.simpleLogger.defaultLogLevel

warn

spark.akka.frameSize

1024

spark.app.id

application_1502118729859_0041

spark.app.name

Apache TinkerPop's Spark-Gremlin

spark.authenticate

false

spark.cores.max

64

spark.driver.appUIAddress

http://10.22.5.61:4040

spark.driver.extraJavaOptons

-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m

spark.driver.extraLibraryPath

/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native

spark.driver.host

10.22.5.61

spark.driver.port

38529

spark.dynamicAllocation.enabled

true

spark.dynamicAllocation.executorIdleTimeout

60

spark.dynamicAllocation.minExecutors

0

spark.dynamicAllocation.schedulerBacklogTimeout

1

spark.eventLog.dir

hdfs://host001:8020/user/spark/applicationHistory

spark.eventLog.enabled

true

spark.executor.extraClassPath

/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf:

spark.executor.extraJavaOptions

-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlogback.configurationFile=logback.xml

spark.executor.extraLibraryPath

/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native

spark.executor.heartbeatInterval

100000

spark.executor.id

driver

spark.executor.memory

10240m

spark.externalBlockStore.folderName

spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db

spark.kyroserializer.buffer.max

1600m

spark.master

yarn-client

spark.network.timeout

90000

spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS

host005

spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES

http://host005:8088/proxy/application_1502118729859_0041

spark.scheduler.mode

FIFO

spark.serializer

org.apache.spark.serializer.KryoSerializer

spark.shuffle.service.enabled

true

spark.shuffle.service.port

7337

spark.ui.filters

org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter

spark.ui.killEnabled

true

spark.yarn.am.extraLibraryPath

/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native

spark.yarn.appMasterEnv.CLASSPATH

/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*

spark.yarn.config.gatewayPath

/opt/cloudera/parcels

spark.yarn.config.replacementPath

{{HADOOP_COMMON_HOME}}/../../..

spark.yarn.dist.archives

/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip

spark.yarn.dist.files

/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml

spark.yarn.dist.jars

/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar

spark.yarn.historyServer.address

http://host001:18088

zookeeper.znode.parent

/hbase


-Joe

HadoopMarc

unread,
Aug 10, 2017, 7:33:27 AM8/10/17
to JanusGraph users list, bi...@xs4all.nl
Hi Joe,

Another thing to try (only tested on Tinkerpop, not on JanusGraph): create the traversalsource as follows:

g = graph.traversal().withComputer(new Computer().graphComputer(SparkGraphComputer).workers(100))

With HadoopGraph this helps hdfs files with very large or no partitions to be split across tasks; I did not check the effect yet for HBaseInputFormat in JanusGraph. And did you add spark.executor.instances=10 (or some suitable number) to your config? And did you check in the RM ui or Spark history server whether these executors were really allocated and started?

More later,

Marc

Op donderdag 10 augustus 2017 00:13:09 UTC+2 schreef Joseph Obernberger:

Joe Obernberger

unread,
Aug 10, 2017, 9:40:21 AM8/10/17
to HadoopMarc, JanusGraph users list

Thank you Marc.

I did not set spark.executor.instances, but I do have spark.cores.max set to 64 and within YARN, it is configured to allow has much RAM/cores for our 5 server cluster.  When I run a job on a table that has 61 regions, I see that 43 tasks are started and running on all 5 nodes in the Spark UI (and running top on each of the servers).  If I lower the amount of RAM (heap) that each tasks has (currently set to 10G), they fail with OutOfMemory exceptions.  It still hits one HBase node very hard and cycles through them.  While that may be a reason for a performance issue, it doesn't explain the massive number of calls that HBase receives for a count job, and why using SparkGraphComputer takes so much more time.

Running with your command below appears to not alter the behavior.  I did run a job last night with DEBUG turned on, but it produced too much logging filling up the log directory on 3 of the 5 nodes before stopping. 
Thanks again Marc!

-Joe

Message has been deleted

HadoopMarc

unread,
Aug 13, 2017, 4:07:45 PM8/13/17
to JanusGraph users list, bi...@xs4all.nl

Hi Joe,

To shed some more light on the running figures you presented, I ran some tests on my own cluster:

1. I loaded the default janusgraph-hbase table with the following simple script from the console:

graph=JanusGraphFactory.open("conf/janusgraph-hbase.properties")
g
= graph.traversal()
m
= 1200L
n
= 10000L
(0L..<m).each{
       
(0L..<n).each{
                v1
= g.addV().id().next()
                v2
= g.addV().id().next()
                g
.V(v1).addE('link1').to(g.V(v2)).next()
                g
.V(v1).addE('link2').to(g.V(v2)).next()
       
}
        g
.tx().commit()
}

This scipt runs about 20(?) minutes and results in 24M vertices and edges committed to the graph.

2. I did an OLTP g.V().count() on this graph from the console: 11 minutes first time, 10 minutes second time

3. I ran OLAP jobs on this graph using janusgraph-hhbase in two ways:
    a) with g = graph.traversal().withComputer(SparkGraphComputer)  
    b) with
g = graph.traversal().withComputer(new Computer().graphComputer(SparkGraphComputer).workers(10))

the properties file was as in the recipe, with the exception of:
   spark.executor.memory=4096m       # smaller values might work, but the 512m from the recipe is definitely too small
   spark.executor.instances=4
   #spark.executor.cores not set, so default value 1

This resulted in the following running times:
   a) stage 0,1,2 => 12min, 12min, 3s => 24min total
   b) stage 0,1,2 => 18min, 1min, 86ms => 19 min total

Discussion:
  • HBase is not an easy source for OLAP: HBase wants large regions for efficiency (configurable, but typically 2-20GB), while mapreduce inputformats (like janusgraph's HBaseInputFormat) take regions as inputsplits by default. This means that only a few executors will read from HBase unless the HBaseInputFormat is extended to split a region's keyspace into multiple inputsplits. This mismatch between the numbers of regions and spark executors is a potential JanusGraph issue. Examples exist to improve on this, e.g. org.apache.hadoop.hbase.mapreduce.RowCounter

  • For spark stages after stage 0 (reading from HBase), increasing the number of spark tasks with the "workers()" setting helps optimizing the parallelization. This means that for larger traversals than just a vertex count, the parallelization with spark will really pay off.

  • I did not try to repeat your settings with a large number of cores. Various sources discourage the use of spark.executor.cores values larger than 5, e.g. https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/, https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory
Hopefully, these tests provide you and other readers with some additional perspectives on the configuration of janusgraph-hbase.

Cheers,    Marc

Op donderdag 10 augustus 2017 15:40:21 UTC+2 schreef Joseph Obernberger:

Gariee

unread,
Aug 14, 2017, 2:19:21 PM8/14/17
to JanusGraph users list, bi...@xs4all.nl
Thank You  Marc !

Pfb complete SparkGraphComputer properties :

#

# SparkGraphComputer Configuration

#

spark.master=yarn-client

spark.yarn.queue=cmp

mapred.job.queue.name=cmp

spark.driver.allowMultipleContexts=true

spark.executor.memory=20g

#spark.ui.port=20000

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/spark/spark/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar

#spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:hadoop classpath

spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop

spark.yarn.appMasterEnv.YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop

spark.yarn.appMasterEnv.SPARK_CONF_DIR=/opt/mapr/spark/spark/conf

spark.executorEnv.CLASSPATH=$CLASSPATH:/opt/mapr/spark/spark/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar

spark.executorEnv.HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop

spark.executorEnv.SPARK_CONF_DIR=/opt/mapr/spark/spark/conf

spark.executor.instances=10

#spark.executor.cores=2

#spark.executor.CoarseGrainedExecutorBackend.cores=2

#spark.executor.CoarseGrainedExecutorBackend.driver=FIXME

#spark.executor.CoarseGrainedExecutorBackend.stopping=false

#spark.streaming.stopGracefullyOnShutdown=true

spark.yarn.driver.memoryOverhead=4g

spark.yarn.executor.memoryOverhead=1024

Joe Obernberger

unread,
Aug 14, 2017, 6:17:12 PM8/14/17
to HadoopMarc, JanusGraph users list

Marc - thank you for this.  I'm going to try getting the latest version of JanusGraph, and compiling it with our specific version of Cloudera CDH, then run some tests.  Will report back.

-Joe

liuzhip...@gmail.com

unread,
Aug 21, 2017, 2:40:13 AM8/21/17
to JanusGraph users list, bi...@xs4all.nl
Hey - Joseph,Did your test successed?Can you share your experience for me ? Thx

在 2017年8月15日星期二 UTC+8上午6:17:12,Joseph Obernberger写道:

Joe Obernberger

unread,
Aug 22, 2017, 11:04:03 AM8/22/17
to liuzhip...@gmail.com, JanusGraph users list, bi...@xs4all.nl

Hi All - I rebuilt Janusgraph from git with the CDH 5.10.0 libraries (just modified the poms) and using that library created a new graph with 159,103,508 and 278,901,629 edges.  I then manually moved regions around in HBase and did splits across our 5 server cluster into 88 regions.  The original size was 22 regions.  The test (g.V().count()) took 1.2 hours to run with Spark to do a count, and a similar amount of time to do the edge count.  I don’t have an exact number, but it looks like to do it without spark took a similar time.  Honestly, I don't know if this is good or bad! 

I replaced the jar files in the lib directory with jars from CDH and then rebuilt the lib.zip file.  My configuration follows:

#
# Hadoop Graph Configuration
#

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=output
gremlin.hadoop.outputLocation=output

log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn

#
# JanusGraph HBase InputFormat configuration
#

janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=FullSpark
janusgraphmr.ioformat.conf.storage.hbase.region-count=44
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
janusgraphmr.ioformat.conf.storage.cache.db-cache-size = 0.5
zookeeper.znode.parent=/hbase

#
# SparkGraphComputer with Yarn Configuration
#

spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*


#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64

spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000

spark.cores.max=5 

#
# Relevant configs from spark-defaults.conf
#

spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true

spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hbase/conf:


spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088

#spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar


spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client

Hope that helps!

-Joe

HadoopMarc

unread,
Aug 23, 2017, 4:08:51 PM8/23/17
to JanusGraph users, liuzhip...@gmail.com, bi...@xs4all.nl
Hi Joe,

Thanks for reporting back your results and confirming the recipe for CDH. Also, your job execution times seem consistent now with the ones I posted above. As to your question whether these figures make sense: I think the loading part of OLAP jobs with HBaseInputFormat is way too slow and needs attention. At his point you are better of with storing the vertex id's on hdfs, do a RDD mapPartitions on these id's and have each spark executor make a connection to JanusGraph and get the vertices it needs with low delay after warming of all HBase caches (I used this approach with Titan and will probably keep it for a while with JanusGraph).

I do not know which plans the JanusGraph team have with the HBaseInputFormat, but I figure they will wait for the future HBase 2.0.0 release which will hopefully cover a number of relevant features, such as:
https://issues.apache.org/jira/browse/HBASE-14789

Cheers,    Marc

Op dinsdag 22 augustus 2017 17:04:03 UTC+2 schreef Joseph Obernberger:

John Helmsen

unread,
Sep 25, 2017, 5:17:53 PM9/25/17
to JanusGraph users
Marc,

Thank you so much for the help in getting Spark 1.6.1 to work with JanusGraph.  We've gotten good use out of it, but now we come to a crossroads.

Our customer wants us to deploy it on their cluster, but their cluster runs Spark 1.6.2.  I noticed that you confirmed the operation of the Spark-YARN-JanusGraph on a HDP 2.5 stack, which typically is running 1.6.2.  Does the setup that we've already gone through transfer to 1.6.2?  If there are problems, what could you anticipate that they might be?


On Thursday, July 6, 2017 at 4:15:37 AM UTC-4, HadoopMarc wrote:

HadoopMarc

unread,
Sep 26, 2017, 3:30:40 PM9/26/17
to JanusGraph users

Hi John

The funny thing is, the recipe does not use the HDP Spark installation at all!  SparkGraphComputer creates a SparkContext and has Yarn start all the Spark machinery. So spark versions do not matter at all, though Spark 2.x requires some other config properties (see the recent PR's on github TinkerPop).

The only interaction with the cluster Spark is for the Spark History server, but I did not notice any problems between Spark 1.6.1 and Spark 1.6.2. See your cluster spark-defaults.xml for the history configs.

Have fun!

Marc



Op maandag 25 september 2017 23:17:53 UTC+2 schreef John Helmsen:

John Helmsen

unread,
Sep 26, 2017, 3:58:45 PM9/26/17
to JanusGraph users
HadoopMarc,

This sounds like this could be really good news, but please clarify something for me:

Tinkerpop 3.2.3 claims compatibility with only Spark 1.6.1, and currently JanusGraph-0.1.1 only supports up to Tinkerpop 3.2.3, so I assumed that JanusGraph would only support Spark 1.6.1.

Now I have two interpretations of your post that I need to have clarified:

1) You have made Spark 1.6.2 work (actually do computations) with JanusGraph-0.1.1.
2) There is a version of Spark 1.6.1 also on the cluster, and it is being called by JanusGraph-0.1.1 while Spark 1.6.2 is being ignored.

Either one is a workable option for me, but please elaborate so I am completely clear about what is happening.

HadoopMarc

unread,
Sep 27, 2017, 1:54:49 AM9/27/17
to JanusGraph users
Hi John,

TinkerPop's spark-gremlin module depends on spark-1.6.1, so when you install spark-gremlin in the gremlin-console or when you add it to your maven project, the spark-core-1.6.1.jar is already on your classpath. The configs in my recipe make sure all deps are also available to the spark-1.6.1 exuctors and application master on a Hadoop alias Yarn cluster. The cluster's spark-1.6.2 jars are never loaded when gremlin-console is used as in my recipe.

Using spark-submit would put spark-1.6.2 on the various classpaths, which would probably also work if it did not cause version conflict between the TinkerPop dependencies and the Spark dependencies.

Also, I believe your implicit assumption that it would be bad practice to put spark-1.6.2 jars on the classpath of a spark-1.6.1 application is not valid. Spark-1.6.2 should support all API's that a Spark-1.6.1 application can depend on (minor version difference).

I hope this clarifies things, configuring complex JVM apps is not for the weak-hearty.

Marc

Op dinsdag 26 september 2017 21:58:45 UTC+2 schreef John Helmsen:

lakshay....@gmail.com

unread,
Oct 15, 2018, 8:13:46 AM10/15/18
to JanusGraph users
Hi,
I am using janusgraph 0.3.0 with spark 2.2.0 in aws emr and getting spark context not initialised error when i set spark.master to yarn(spark.master=yarn) on local mode.
my config file looks like this:-

# read-cassandra-3.properties
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true

#
# JanusGraph Cassandra InputFormat configuration
#
# These properties defines the connection properties which were used while write data to JanusGraph.
janusgraphmr.ioformat.conf.storage.backend=cassandra
# This specifies the hostname & port for Cassandra data store.
janusgraphmr.ioformat.conf.storage.hostname=x.x.x.x
janusgraphmr.ioformat.conf.storage.port=9160
# This specifies the keyspace where data is stored.
janusgraphmr.ioformat.conf.storage.cassandra.keyspace=clinkgraphdb
# This defines the indexing backned configuration used while writing data to JanusGraph.
janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
janusgraphmr.ioformat.conf.index.search.hostname=x.x.x.x
# Use the appropriate properties for the backend when using a different storage backend (HBase) or indexing backend (Solr).

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

#
# SparkGraphComputer Configuration
#
spark.master=yarn
spark.executor.memory=2g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

Any help is appreciated

Regards,
Lakshay Sharma

HadoopMarc

unread,
Oct 15, 2018, 10:40:00 AM10/15/18
to JanusGraph users
Hi Lakshay,

Good to hear you are trying JanusGraph with Spark. First:
  • can you please start a new thread for a new question rather than reviving an old thread
  • can you please post stacktraces in error cases
You can rephrase the question with the new errors after setting spark.submit.deployMode=client

Also, as JanusGraph-0.3 uses Spark2, check out:

Cheers,      Marc


Op maandag 15 oktober 2018 14:13:46 UTC+2 schreef lakshay....@gmail.com:
Reply all
Reply to author
Forward
0 new messages