Help with Tinkerpop 3.2.4 running on Yarn

1,066 views

Skip to first unread message

Nick Kaufman

unread,

May 18, 2017, 1:03:43 PM5/18/17

to Gremlin-users

Hello,

I have a 3 node cluster setup through Ambari, with HDFS 2.7.3 and Spark 1.6.3. I'm attempting to configure Tinkerpop 3.2.4 with this, and be able to run queries/traversals using SparkGraphComputer. Ultimately, I would like to also use either Titan or JanusGraph with an HBase backend, but I figured those would be later configurations.

I am able to submit these queries successfully when using spark.master=local[*], but when I try to submit instead with spark.master=yarn-client, I am unable to even initialize a spark context.

I have a few questions:

1. Is this configuration possible, given the listed versions? I haven't found an obvious place where which indicates compatible versions of the various apache products.

2. If this is possible, can anyone shed some light as to what might be going wrong? I suspected that perhaps some JAR files weren't getting submitted properly, but at this point I'm honestly. not sure.

Thanks,

-Nick

I've copied and pasted some context below. Most of this was taken from people contributing in this group.

I'm launching the Gremlin Console with the following script:

export HADOOP_HOME=/usr/hdp/current/hadoop-client

#export HADOOP_CONF_DIR=/etc/hadoop/conf

export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

export YARN_HOME=/usr/hdp/current/hadoop-yarn-client

export YARN_CONF_DIR=$HADOOP_CONF_DIR

export SPARK_HOME=/usr/hdp/current/spark-client

#export SPARK_CONF_DIR=/etc/spark/conf

export SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

source "$HADOOP_CONF_DIR"/hadoop-env.sh

source "$YARN_CONF_DIR"/yarn-env.sh

source "$SPARK_HOME"/bin/load-spark-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0

export JAVA_OPTIONS="$JAVA_OPTIONS -Djava.library.path=/usr/hdp/2.6.0.0-203/hadoop/lib/native -Dtinkerpop.ext=ext -Dlog4j.configuration=conf/log4j-console.properties -Dhdp.version=2.6.0.0-203"

GREMLINHOME=/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4

export HADOOP_GREMLIN_LIBS=$GREMLINHOME/ext/hadoop-gremlin/plugin:$GREMLINHOME/ext/hadoop-gremlin/lib:$GREMLINHOME/ext/spark-gremlin/plugin:$GREMLINHOME/ext/spark-gremlin/lib:$GREMLINHOME/lib

#export HADOOP_GREMLIN_LIBS=/etc/spark/lib/hadoop-gremlin-libs

#export CLASSPATH=/etc/hadoop/conf:/etc/spark/conf:/usr/hdp/current/spark-client/lib:$GREMLINHOME/lib/*.jar

export CLASSPATH=$HADOOP_CONF_DIR:$HADOOP_HOME/client/*.jar:$YARN_HOME:$YARN_CONF_DIR:$GREMLINHOME/lib/*.jar:$SPARK_HOME/lib/*.jar:$SPARK_CONF_DIR:$CLASSPATH:$HADOOP_GREMLIN_LIBS

cd $GREMLINHOME

exec bin/gremlin.sh

In the console, I'm opening the following hadoop-test.properties file:

spark.yarn.appMasterEnv.JAVA_HOME=/usr/jdk64/jdk1.8.0_77/jre

spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

spark.yarn.appMasterEnv.SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/usr/hdp/current/hadoop-mapreduce-client:/usr/hdp/current/hadoop-mapreduce-client/lib/*.jar

spark.yarn.am.extraJavaOptions=-Dhdp.version=2.6.0.0-203

# spark executors (on work nodes) ...?

spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0_77/jre

spark.executorEnv.HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

spark.executorEnv.SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

spark.executor.extraJavaOptions=-Dhdp.version=2.6.0.0-203

spark.executor.memory=1g

# copied from spark-defaults.conf

spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

spark.eventLog.dir=hdfs:///spark-history

spark.eventLog.enabled=true

spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

spark.history.fs.logDirectory=hdfs:///spark-history

spark.history.kerberos.keytab=none

spark.history.kerberos.principal=none

spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider

spark.history.ui.port=18080

spark.yarn.containerLauncherMaxThreads=25

spark.yarn.driver.memoryOverhead=384

spark.yarn.executor.memoryOverhead=384

spark.yarn.historyServer.address=<IP_ADDRESS_HERE>:18080

spark.yarn.max.executor.failures=3

spark.yarn.preserve.staging.files=false

spark.yarn.queue=default

spark.yarn.scheduler.heartbeat.interval-ms=5000

spark.yarn.submit.file.replication=3

# integrate with the yarn spark history server

spark.yarn.services=org.apache.spark.deploy.yarn.history.YarnHistoryService

spark.history.provider=org.apache.spark.deploy.yarn.history.YarnHistoryProvider

Console:

\,,,/

(o o)

-----oOOo-(3)-oOOo-----

plugin activated: tinkerpop.server

plugin activated: tinkerpop.utilities

plugin activated: tinkerpop.credentials

INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph - HADOOP_GREMLIN_LIBS is set to: /home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/hadoop-gremlin/plugin:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/hadoop-gremlin/lib:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/spark-gremlin/plugin:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/spark-gremlin/lib:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/lib

plugin activated: tinkerpop.hadoop

plugin activated: tinkerpop.spark

plugin activated: tinkerpop.tinkergraph

gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-test.properties')

==>hadoopgraph[gryoinputformat->gryooutputformat]

gremlin> g = graph.traversal().withComputer(SparkGraphComputer)

==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]

gremlin> g.V().count()

WARN org.apache.spark.SparkConf -

SPARK_CLASSPATH was detected (set to ':').

This is deprecated in Spark 1.0+.

Please instead use:

- ./spark-submit with --driver-class-path to augment the driver classpath

- spark.executor.extraClassPath to augment the executor classpath

WARN org.apache.spark.SparkConf - Setting 'spark.executor.extraClassPath' to ':' as a work-around.

WARN org.apache.spark.SparkConf - Setting 'spark.driver.extraClassPath' to ':' as a work-around.

ERROR org.apache.spark.SparkContext - Error initializing SparkContext.

org.apache.spark.SparkException: Unable to load YARN support

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:399)

at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:394)

at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:394)

at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:411)

at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2118)

at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365)

at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)

at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)

at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)

at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)

at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)

at org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)

at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:143)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:348)

at org.apache.spark.util.Utils$.classForName(Utils.scala:174)

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:395)

... 18 more

org.apache.spark.SparkException: Unable to load YARN support

Type ':help' or ':h' for help.

Display stack trace? [yN]Y

java.lang.IllegalStateException: org.apache.spark.SparkException: Unable to load YARN support

at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50)

at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)

at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:184)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:237)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)

at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:447)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:191)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)

at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)

at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)

at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:169)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:478)

Caused by: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Unable to load YARN support

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)

... 56 more

Caused by: org.apache.spark.SparkException: Unable to load YARN support