15:58:49,110 INFO SecurityManager:58 - Changing view acls to: rc15:58:49,110 INFO SecurityManager:58 - Changing modify acls to: rc15:58:49,110 INFO SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rc); users with modify permissions: Set(rc)15:58:49,111 INFO Client:58 - Submitting application 25 to ResourceManager15:58:49,320 INFO YarnClientImpl:274 - Submitted application application_1500608983535_002515:58:49,321 INFO SchedulerExtensionServices:58 - Starting Yarn extension services with app application_1500608983535_0025 and attemptId None15:58:50,325 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)15:58:50,326 INFO Client:58 - client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1500883129115 final status: UNDEFINED tracking URL: http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025/ user: rc15:58:51,330 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)15:58:52,333 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)15:58:53,335 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)15:58:54,337 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)15:58:55,340 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)
15:58:56,343 INFO Client:58 - Application report for application_1500608983535_0025 (state: ACCEPTED)15:58:56,802 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:58 - ApplicationMaster registered as NettyRpcEndpointRef(null)15:58:56,822 INFO YarnClientSchedulerBackend:58 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> dl-rc-optd-ambari-master-v-test-1.host.dataengine.com,dl-rc-optd-ambari-master-v-test-2.host.dataengine.com, PROXY_URI_BASES -> http://dl-rc-optd-ambari-master-v-test-1.host.dataengine.com:8088/proxy/application_1500608983535_0025,http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025), /proxy/application_1500608983535_002515:58:56,824 INFO JettyUtils:58 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter15:58:57,346 INFO Client:58 - Application report for application_1500608983535_0025 (state: RUNNING)15:58:57,347 INFO Client:58 - client token: N/A diagnostics: N/A ApplicationMaster host: 10.200.48.154 ApplicationMaster RPC port: 0 queue: default start time: 1500883129115 final status: UNDEFINED tracking URL: http://dl-rc-optd-ambari-master-v-test-2.host.dataengine.com:8088/proxy/application_1500608983535_0025/ user: rc15:58:57,348 INFO YarnClientSchedulerBackend:58 - Application application_1500608983535_0025 has started running.15:58:57,358 INFO Utils:58 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47514.15:58:57,358 INFO NettyBlockTransferService:58 - Server created on 4751415:58:57,360 INFO BlockManagerMaster:58 - Trying to register BlockManager15:58:57,363 INFO BlockManagerMasterEndpoint:58 - Registering block manager 10.200.48.112:47514 with 2.4 GB RAM, BlockManagerId(driver, 10.200.48.112, 47514)15:58:57,366 INFO BlockManagerMaster:58 - Registered BlockManager
15:58:57,585 INFO EventLoggingListener:58 - Logging events to hdfs:///spark-history/application_1500608983535_002515:59:07,177 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:70 - Container marked as failed: container_e170_1500608983535_0025_01_000002 on host: dl-rc-optd-ambari-slave-v-test-1.host.dataengine.com. Exit status: 1. Diagnostics: Exception from container-launch.Container id: container_e170_1500608983535_0025_01_000002Exit code: 1Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:576) at org.apache.hadoop.util.Shell.run(Shell.java:487) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1main : run as user is rcmain : requested yarn user is rc
Container exited with a non-zero exit code 1
Display stack trace? [yN]15:59:57,702 WARN TransportChannelHandler:79 - Exception in connection from 10.200.48.155/10.200.48.155:50921java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:748)
15:59:57,704 ERROR TransportResponseHandler:132 - Still have 1 requests outstanding when connection from 10.200.48.155/10.200.48.155:50921 is closed15:59:57,706 WARN NettyRpcEndpointRef:91 - Error sending message [message = RequestExecutors(0,0,Map())] in 1 attemptsjava.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:748)
Marc - thank you for posting this. I'm trying to get this to work with our CDH 5.10.0 distribution, but have run into an issue; but first some questions. I'm using a 5 node cluster, and I think I do not need to set the zookeeper.zone.parent since the hbase configuration is in /etc/conf/hbase. Is that correct?
The error that I'm getting is:
org.apache.spark.SparkException: Job aborted due to stage
failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
Lost task 1.3 in stage 0.0 (TID 10, host002, executor 1):
java.lang.ClassCastException: cannot assign instance of
java.lang.invoke.SerializedLambda to field
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330
of type org.apache.spark.api.java.function.PairFunction in
instance of
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2238)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2112)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2123)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2232)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2156)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2014)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1536)
at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Given this post:
https://stackoverflow.com/questions/28186607/java-lang-classcastexception-using-lambda-expressions-in-spark-job-on-remote-ser
It looks like I'm not including a necessary jar, but I'm at a loss as to which one. Any ideas?
For reference, here is part of the config:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
#janusgraphmr.ioformat.conf.storage.hostname=fqdn1,fqdn2,fqdn3
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=TEST0.2.0
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=18
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
#zookeeper.znode.parent=/hbase-unsecure
# Security configs are needed in case of a secure cluster
#zookeeper.znode.parent=/hbase-secure
#hbase.rpc.protection=privacy
#hbase.security.authentication=kerberos
#
# SparkGraphComputer with Yarn Configuration
#
spark.master=yarn-client
spark.executor.memory=512m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.yarn.appMasterEnv.CLASSPATH=/etc/hadoop/conf:./lib.zip/*:
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*:/opt/cloudera/parcels/CDH/jars/spark-core_2.10-1.6.0-cdh5.10.0.jar
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
Thank you!
-Joe
--
You received this message because you are subscribed to the Google Groups "JanusGraph users list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Could this be a networking issue? Maybe a firewall is enabled, or selinux is preventing a connection?
I've been able to get this to work, but running a simple count - g.V().count() on anything but a very small graph takes a very very long time (hours). Are there any cache settings, or other resources that could be modified to better the performance?
The YARN container logs are filled withe debug lines about 'Created dirty vertex map with initial size 32', 'Created vertex cache with max size 20000', and 'Generated HBase Filter ColumnRange Filter'. Can any of these things be adjusted in the properties file? Thank you!
-Joe
--
Hi Marc - I've been able to get it to run longer, but am now getting a RowTooBigException from HBase. How does JanusGraph store data in HBase? The current max size of a row in 1GByte, which makes me think this error is covering something else up.
What I'm seeing so far in testing with a 5 server cluster - each
machine with 128G of RAM:
HBase table is 1.5G in size, split across 7 regions, and has
20,001,105 rows. To do a g.V().count() takes 2 hours and results
in 3,842,755 verticies.
Another HBase table is 5.7G in size split across 10 regions, is 57,620,276 rows, and took 6.5 hours to run the count and results in 10,859,491 nodes. When running, it looks like it hits one server very hard even though the YARN tasks are distributed across the cluster. One HBase node gets hammered.
The RowTooBigException is below. Anything to try? Thank you for
any help!
org.janusgraph.core.JanusGraphException: Could not process
individual retrieval call
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:257)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1269)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6.execute(StandardJanusGraphTx.java:1137)
at
org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:209)
at
org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:75)
at
org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:67)
at
org.janusgraph.graphdb.query.ResultSetIterator.next(ResultSetIterator.java:28)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:651)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.janusgraph.hadoop.formats.util.input.current.JanusGraphHadoopSetupImpl.getTypeInspector(JanusGraphHadoopSetupImpl.java:60)
at
org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.<init>(JanusGraphVertexDeserializer.java:55)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.lambda$static$0(GiraphInputFormat.java:49)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:100)
at
org.janusgraph.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:47)
at
org.janusgraph.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:67)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at
org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.janusgraph.core.JanusGraphException: Could not call
index
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1262)
at
org.janusgraph.graphdb.query.QueryUtil.processIntersectingRetrievals(QueryUtil.java:255)
... 34 more
Caused by: org.janusgraph.core.JanusGraphException: Could not
execute operation due to backend exception
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at
org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:444)
at
org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:395)
at
org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)
at
org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:529)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.lambda$call$5(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:97)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:89)
at
org.janusgraph.graphdb.query.profile.QueryProfiler.profile(QueryProfiler.java:81)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1258)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6$1.call(StandardJanusGraphTx.java:1255)
at
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at
com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
at
org.janusgraph.graphdb.transaction.StandardJanusGraphTx$6$6.call(StandardJanusGraphTx.java:1255)
... 35 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Could not successfully complete backend operation due to repeated
temporary exceptions after PT10S
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
at
org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 53 more
Caused by: org.janusgraph.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getHelper(HBaseKeyColumnValueStore.java:202)
at
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore.getSlice(HBaseKeyColumnValueStore.java:90)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:398)
at
org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:395)
at
org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
... 54 more
Caused by:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=35, exceptions:
Sat Aug 05 07:22:03 EDT 2017,
RpcRetryingCaller{globalStartTime=1501932111280, pause=100,
retries=35},
org.apache.hadoop.hbase.regionserver.RowTooBigException:
rg.apache.hadoop.hbase.regionserver.RowTooBigException: Max row
size allowed: 1073741824, but the row is bigger than that.
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:564)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5697)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5856)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5634)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5611)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5597)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6792)
at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6770)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2023)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
--
Hi Marc - thank you very much for your reply. I like your idea
about moving regions manually and will try that. As to OLAP vs
OLTP (I assume Spark vs none), yes I have those times.
For a 1.5G table in HBase the count just using the gremlin shell
without using the SparkGraphComputer:
graph = JanusGraphFactory.open('conf/graph.properties')
g=graph.traversal()
g.V().count()
takes just under 1 minute. Using spark it takes about 2 hours. So something isn't right. They both return 3,842,755 vertices. When I run it with Spark, it hits one of the region servers hard - doing over 30k requests per second for those 2 hours.
-Joe
But when I am trying to execute the following where cluster is mapr and spark on yarn
plugin activated: tinkerpop.tinkergraph
gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
==>hadoopgraph[gryoinputformat->nulloutputformat]
gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->nulloutputformat], sparkgraphcomputer]
gremlin> g.V().count()
hadoop-load.properties ( I tried all different combinations as commented below each time its the same error)
#
# SparkGraphComputer Configuration
#
spark.master=yarn-client
spark.yarn.queue=cmp
#spark.driver.allowMultipleContexts=true
#spark.executor.memory=4g
#spark.ui.port=20000
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
#spark.executor.instances=10
#spark.executor.cores=2
#spark.executor.CoarseGrainedExecutorBackend.cores=2
#spark.executor.CoarseGrainedExecutorBackend.driver=FIXME
#spark.executor.CoarseGrainedExecutorBackend.stopping=false
#spark.streaming.stopGracefullyOnShutdown=true
#spark.yarn.driver.memoryOverhead=4g
#spark.yarn.executor.memoryOverhead=1024
#spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.0.0-2557
--------------------------------------------------------------------------------------------------------------
yarn log
------------------------------------------------------------------------------------------------------------
When the last command is executed, driver abruptly shuts down in yarn container and shuts down the spark context too with following error from Yarn logs
LogType:stderr
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:2441
Log Contents:
17/07/31 14:08:05 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
17/07/31 14:08:05 INFO spark.SecurityManager: Changing view acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager: Changing modify acls to: cmphs
17/07/31 14:08:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs)
17/07/31 14:08:06 INFO spark.SecurityManager: Changing view acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager: Changing modify acls to: cmphs
17/07/31 14:08:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cmphs); users with modify permissions: Set(cmphs)
17/07/31 14:08:06 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/07/31 14:08:06 INFO Remoting: Starting remoting
17/07/31 14:08:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecuto...@abcd.com:36376]
17/07/31 14:08:06 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 36376.
17/07/31 14:08:06 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-mapr/nm-local-dir/usercache/cmphs/appcache/application_1501284102300_47651/blockmgr-244e0062-016e-4402-85c9-69f2ab9ef9d2
17/07/31 14:08:06 INFO storage.MemoryStore: MemoryStore started with capacity 2.7 GB
17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrain...@10.16.127.60:43768
17/07/31 14:08:06 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
17/07/31 14:08:06 INFO executor.Executor: Starting executor ID 7 on host abcd.com
17/07/31 14:08:06 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44208.
17/07/31 14:08:06 INFO netty.NettyBlockTransferService: Server created on 44208
17/07/31 14:08:06 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/07/31 14:08:06 INFO storage.BlockManagerMaster: Registered BlockManager
17/07/31 14:08:41 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
17/07/31 14:08:41 INFO storage.MemoryStore: MemoryStore cleared
17/07/31 14:08:41 INFO storage.BlockManager: BlockManager stopped
17/07/31 14:08:41 INFO util.ShutdownHookManager: Shutdown hook called
End of LogType:stderr
LogType:stdout
Log Upload Time:Mon Jul 31 14:08:42 -0700 2017
LogLength:0
Log Contents:
End of LogType:stdout
I am stuck wit h this error , request help here.
Could you let us know a little more about your configuration? What is your storage backend for JanusGraph (HBase/Cassandra)? I actually do not see an error in your log, but at the very least you'll need to defein spark.executor.extraClassPath to point to the various jars required. Are there other logs you can look at such as the container logs for YARN?
I assume you've see Marc's blog post:
http://yaaics.blogspot.com/2017/07/configuring-janusgraph-for-spark-yarn.html
-Joe
--
Hi Marc - I did try splitting regions. What happens when I run the SparkGraphComputer job, is that it seems to hit one region server hard, then moves onto the next; appears to run serially. All that said, I think the big issue is that running with Spark for a count takes 2 hours and running without takes 2 minutes. Perhaps the SparkGraphComputer is not the tool to be used for handling large graphs? Although, I'm not sure what the purpose is then?
For reference, here is my Hadoop configuration file using Cloudera CDH 5.10.0 and JanusGraph 0.1.0:
Config:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=host003:2181,host004:2181,host005:2181
janusgraphmr.ioformat.conf.storage.hbase.table=Large2
janusgraphmr.ioformat.conf.storage.hbase.region-count=5
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
# Security configs are needed in case of a secure cluster
zookeeper.znode.parent=/hbase
#
# SparkGraphComputer with Yarn Configuration
#
#spark.master=yarn-client
spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar,/home/graph/janusgraph-0.1.0-SNAPSHOT-hadoop2/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000
spark.cores.max=64
#
# Relevant configs from spark-defaults.conf
#
spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.1.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../conf:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/..:\
/opt/cloudera/parcels/CDH/lib/spark/assembly/lib/spark-assembly.jar:\
/usr/lib/jvm/java-openjdk/jre/lib/rt.jar:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hadoop/conf:\
/etc/hbase/conf:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop/.//*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/libexec/../../hadoop-yarn/.//*
spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088
spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Probably, you experience all kinds of mismatches in netty libraries which slows down or even kills all comms between the yarn containers. The philosophy of the recipes really is to only add the minimum number of conf folders and jars to the Tinkerpop/Janusgraph distribution and see from there if any libraries are missing.
At my side, it has become apparent that I should at least add to the recipes:
So, still some work to do!
Cheers, Marc
Marc - thank you. I've updated the classpath and removed nearly all of the CDH jars; had to keep chimera and some of the HBase libs in there. Apart from those and all the jars in lib.zip, it is working as it did before. The reason I turned DEBUG off was because it was producing 100+GBytes of logs. Nearly all of which are things like:
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore -
Generated HBase Filter ColumnRangeFilter [\x10\xC0, \x10\xC1)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava
vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created dirty vertex map with initial size 32
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created vertex cache with max size 20000
18:04:29 DEBUG
org.janusgraph.diskstorage.hbase.HBaseKeyColumnValueStore -
Generated HBase Filter ColumnRangeFilter [\x10\xC2, \x10\xC3)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Guava
vertex cache size: requested=20000 effective=20000 (min=100)
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created dirty vertex map with initial size 32
18:04:29 DEBUG
org.janusgraph.graphdb.transaction.vertexcache.GuavaVertexCache -
Created vertex cache with max size 20000
Name |
Value |
gremlin.graph |
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph |
gremlin.hadoop.deriveMemory |
false |
gremlin.hadoop.graphReader |
org.janusgraph.hadoop.formats.hbase.HBaseInputFormat |
gremlin.hadoop.graphWriter |
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat |
gremlin.hadoop.graphWriter.hasEdges |
false |
gremlin.hadoop.inputLocation |
none |
gremlin.hadoop.jarsInDistributedCache |
true |
gremlin.hadoop.memoryOutputFormat |
org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat |
gremlin.hadoop.outputLocation |
output |
janusgraphmr.ioformat.conf.storage.backend |
hbase |
janusgraphmr.ioformat.conf.storage.hbase.region-count |
5 |
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server |
5 |
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names |
false |
janusgraphmr.ioformat.conf.storage.hbase.table |
TEST0.2.0 |
janusgraphmr.ioformat.conf.storage.hostname |
|
log4j.appender.STDOUT |
org.apache.log4j.ConsoleAppender |
log4j.logger.deng |
WARNING |
log4j.rootLogger |
STDOUT |
org.slf4j.simpleLogger.defaultLogLevel |
warn |
spark.akka.frameSize |
1024 |
application_1502118729859_0041 |
|
Apache TinkerPop's Spark-Gremlin |
|
spark.authenticate |
false |
spark.cores.max |
64 |
spark.driver.appUIAddress |
|
spark.driver.extraJavaOptons |
-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m |
spark.driver.extraLibraryPath |
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native |
spark.driver.host |
10.22.5.61 |
spark.driver.port |
38529 |
spark.dynamicAllocation.enabled |
true |
spark.dynamicAllocation.executorIdleTimeout |
60 |
spark.dynamicAllocation.minExecutors |
0 |
spark.dynamicAllocation.schedulerBacklogTimeout |
1 |
spark.eventLog.dir |
hdfs://host001:8020/user/spark/applicationHistory |
spark.eventLog.enabled |
true |
spark.executor.extraClassPath |
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:/etc/hbase/conf: |
spark.executor.extraJavaOptions |
-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlogback.configurationFile=logback.xml |
spark.executor.extraLibraryPath |
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native |
spark.executor.heartbeatInterval |
100000 |
driver |
|
spark.executor.memory |
10240m |
spark.externalBlockStore.folderName |
spark-27dac3f3-dfbc-4f32-b52d-ececdbcae0db |
spark.kyroserializer.buffer.max |
1600m |
spark.master |
yarn-client |
spark.network.timeout |
90000 |
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS |
host005 |
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES |
|
spark.scheduler.mode |
FIFO |
spark.serializer |
org.apache.spark.serializer.KryoSerializer |
spark.shuffle.service.enabled |
true |
spark.shuffle.service.port |
7337 |
spark.ui.filters |
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter |
spark.ui.killEnabled |
true |
spark.yarn.am.extraLibraryPath |
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native |
spark.yarn.appMasterEnv.CLASSPATH |
/etc/haddop/conf:/etc/hbase/conf:./lib.zip/* |
spark.yarn.config.gatewayPath |
/opt/cloudera/parcels |
spark.yarn.config.replacementPath |
{{HADOOP_COMMON_HOME}}/../../.. |
spark.yarn.dist.archives |
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip |
spark.yarn.dist.files |
/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml |
spark.yarn.dist.jars |
/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar |
spark.yarn.historyServer.address |
|
zookeeper.znode.parent |
/hbase |
Thank you Marc.
I did not set spark.executor.instances, but I do have spark.cores.max set to 64 and within YARN, it is configured to allow has much RAM/cores for our 5 server cluster. When I run a job on a table that has 61 regions, I see that 43 tasks are started and running on all 5 nodes in the Spark UI (and running top on each of the servers). If I lower the amount of RAM (heap) that each tasks has (currently set to 10G), they fail with OutOfMemory exceptions. It still hits one HBase node very hard and cycles through them. While that may be a reason for a performance issue, it doesn't explain the massive number of calls that HBase receives for a count job, and why using SparkGraphComputer takes so much more time.
Running with your command below appears to not alter the
behavior. I did run a job last night with DEBUG turned on, but it
produced too much logging filling up the log directory on 3 of the
5 nodes before stopping.
Thanks again Marc!
-Joe
graph=JanusGraphFactory.open("conf/janusgraph-hbase.properties")
g = graph.traversal()
m = 1200L
n = 10000L
(0L..<m).each{
(0L..<n).each{
v1 = g.addV().id().next()
v2 = g.addV().id().next()
g.V(v1).addE('link1').to(g.V(v2)).next()
g.V(v1).addE('link2').to(g.V(v2)).next()
}
g.tx().commit()
}
org.apache.hadoop.hbase.mapreduce.RowCounter
#
# SparkGraphComputer Configuration
#
spark.master=yarn-client
spark.yarn.queue=cmp
spark.driver.allowMultipleContexts=true
spark.executor.memory=20g
#spark.ui.port=20000
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/opt/mapr/spark/spark/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
#spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:hadoop classpath
spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
spark.yarn.appMasterEnv.YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
spark.yarn.appMasterEnv.SPARK_CONF_DIR=/opt/mapr/spark/spark/conf
spark.executorEnv.CLASSPATH=$CLASSPATH:/opt/mapr/spark/spark/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar
spark.executorEnv.HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
spark.executorEnv.SPARK_CONF_DIR=/opt/mapr/spark/spark/conf
spark.executor.instances=10
#spark.executor.cores=2
#spark.executor.CoarseGrainedExecutorBackend.cores=2
#spark.executor.CoarseGrainedExecutorBackend.driver=FIXME
#spark.executor.CoarseGrainedExecutorBackend.stopping=false
#spark.streaming.stopGracefullyOnShutdown=true
spark.yarn.driver.memoryOverhead=4g
spark.yarn.executor.memoryOverhead=1024
Marc - thank you for this. I'm going to try getting the latest version of JanusGraph, and compiling it with our specific version of Cloudera CDH, then run some tests. Will report back.
-Joe
Hi All - I rebuilt Janusgraph from git with the
CDH 5.10.0 libraries (just modified the poms) and using that
library created a new graph with 159,103,508 and 278,901,629
edges. I then manually moved regions around in HBase and did
splits across our 5 server cluster into 88 regions. The original
size was 22 regions. The test (g.V().count()) took 1.2 hours to
run with Spark to do a count, and a similar amount of time to do
the edge count. I don’t have an exact number, but it looks like
to do it without spark took a similar time. Honestly, I don't
know if this is good or bad!
I replaced the jar files in the lib directory with jars from CDH and then rebuilt the lib.zip file. My configuration follows:
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=output
gremlin.hadoop.outputLocation=output
log4j.rootLogger=WARNING, STDOUT
log4j.logger.deng=WARNING
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
org.slf4j.simpleLogger.defaultLogLevel=warn
#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=10.22.5.63:2181,10.22.5.64:2181,10.22.5.65:2181
janusgraphmr.ioformat.conf.storage.hbase.table=FullSpark
janusgraphmr.ioformat.conf.storage.hbase.region-count=44
janusgraphmr.ioformat.conf.storage.hbase.regions-per-server=5
janusgraphmr.ioformat.conf.storage.hbase.short-cf-names=false
janusgraphmr.ioformat.conf.storage.cache.db-cache-size = 0.5
zookeeper.znode.parent=/hbase
#
# SparkGraphComputer with Yarn Configuration
#
spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
-Dlogback.configurationFile=logback.xml
spark.driver.extraJavaOptons=-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m
spark.master=yarn-cluster
spark.executor.memory=10240m
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
spark.yarn.dist.archives=/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/lib.zip
spark.yarn.dist.files=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar,/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2.JOE/conf/logback.xml
spark.yarn.dist.jars=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar
spark.yarn.appMasterEnv.CLASSPATH=/etc/haddop/conf:/etc/hbase/conf:./lib.zip/*
#spark.executor.extraClassPath=/etc/hadoop/conf:/etc/hbase/conf:/home/graph/janusgraph-0.2.0-SNAPSHOT-hadoop2/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/native:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
spark.akka.frameSize=1024
spark.kyroserializer.buffer.max=1600m
spark.network.timeout=90000
spark.executor.heartbeatInterval=100000
spark.cores.max=5
#
# Relevant configs from spark-defaults.conf
#
spark.authenticate=false
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.killEnabled=true
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/jars/janusgraph-hbase-0.2.0-SNAPSHOT.jar:./lib.zip/*:\
/opt/cloudera/parcels/CDH/lib/hbase/bin/../lib/*:\
/etc/hbase/conf:
spark.eventLog.dir=hdfs://host001:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://host001:18088
#spark.yarn.jar=local:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/lib/spark-assembly.jar
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.master=yarn-client
Hi John
The funny thing is, the recipe does not use the HDP Spark
installation at all! SparkGraphComputer creates a SparkContext
and has Yarn start all the Spark machinery. So spark versions do
not matter at all, though Spark 2.x requires some other config
properties (see the recent PR's on github TinkerPop).
The only interaction with the cluster Spark is for the Spark
History server, but I did not notice any problems between Spark
1.6.1 and Spark 1.6.2. See your cluster spark-defaults.xml for the
history configs.
Have fun!
Marc