Help with Tinkerpop 3.2.4 running on Yarn

981 views
Skip to first unread message

Nick Kaufman

unread,
May 18, 2017, 1:03:43 PM5/18/17
to Gremlin-users
Hello,

I have a 3 node cluster setup through Ambari, with HDFS 2.7.3 and Spark 1.6.3. I'm attempting to configure Tinkerpop 3.2.4 with this, and be able to run queries/traversals using SparkGraphComputer. Ultimately, I would like to also use either Titan or JanusGraph with an HBase backend, but I figured those would be later configurations.

I am able to submit these queries successfully when using spark.master=local[*], but when I try to submit instead with spark.master=yarn-client, I am unable to even initialize a spark context.

I have a few questions:
1. Is this configuration possible, given the listed versions? I haven't found an obvious place where which indicates compatible versions of the various apache products.
2. If this is possible, can anyone shed some light as to what might be going wrong? I suspected that perhaps some JAR files weren't getting submitted properly, but at this point I'm honestly. not sure. 

Thanks,
-Nick

I've copied and pasted some context below. Most of this was taken from people contributing in this group.

I'm launching the Gremlin Console with the following script: 

export HADOOP_HOME=/usr/hdp/current/hadoop-client

#export HADOOP_CONF_DIR=/etc/hadoop/conf

export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

export YARN_HOME=/usr/hdp/current/hadoop-yarn-client

export YARN_CONF_DIR=$HADOOP_CONF_DIR

export SPARK_HOME=/usr/hdp/current/spark-client

#export SPARK_CONF_DIR=/etc/spark/conf

export SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf


source "$HADOOP_CONF_DIR"/hadoop-env.sh

source "$YARN_CONF_DIR"/yarn-env.sh

source "$SPARK_HOME"/bin/load-spark-env.sh


export JAVA_HOME=/usr/lib/jvm/java-1.8.0

export JAVA_OPTIONS="$JAVA_OPTIONS -Djava.library.path=/usr/hdp/2.6.0.0-203/hadoop/lib/native -Dtinkerpop.ext=ext -Dlog4j.configuration=conf/log4j-console.properties -Dhdp.version=2.6.0.0-203"

GREMLINHOME=/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4

export HADOOP_GREMLIN_LIBS=$GREMLINHOME/ext/hadoop-gremlin/plugin:$GREMLINHOME/ext/hadoop-gremlin/lib:$GREMLINHOME/ext/spark-gremlin/plugin:$GREMLINHOME/ext/spark-gremlin/lib:$GREMLINHOME/lib

#export HADOOP_GREMLIN_LIBS=/etc/spark/lib/hadoop-gremlin-libs

#export CLASSPATH=/etc/hadoop/conf:/etc/spark/conf:/usr/hdp/current/spark-client/lib:$GREMLINHOME/lib/*.jar

export CLASSPATH=$HADOOP_CONF_DIR:$HADOOP_HOME/client/*.jar:$YARN_HOME:$YARN_CONF_DIR:$GREMLINHOME/lib/*.jar:$SPARK_HOME/lib/*.jar:$SPARK_CONF_DIR:$CLASSPATH:$HADOOP_GREMLIN_LIBS



cd $GREMLINHOME

exec bin/gremlin.sh


In the console, I'm opening the following hadoop-test.properties file:

spark.yarn.appMasterEnv.JAVA_HOME=/usr/jdk64/jdk1.8.0_77/jre

spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

spark.yarn.appMasterEnv.SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/usr/hdp/current/hadoop-mapreduce-client:/usr/hdp/current/hadoop-mapreduce-client/lib/*.jar

spark.yarn.am.extraJavaOptions=-Dhdp.version=2.6.0.0-203 


# spark executors (on work nodes) ...?

spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-1.8.0_77/jre

spark.executorEnv.HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

spark.executorEnv.SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

spark.executor.extraJavaOptions=-Dhdp.version=2.6.0.0-203

spark.executor.memory=1g


# copied from spark-defaults.conf

spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

spark.eventLog.dir=hdfs:///spark-history

spark.eventLog.enabled=true

spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

spark.history.fs.logDirectory=hdfs:///spark-history

spark.history.kerberos.keytab=none

spark.history.kerberos.principal=none

spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider

spark.history.ui.port=18080

spark.yarn.containerLauncherMaxThreads=25

spark.yarn.driver.memoryOverhead=384

spark.yarn.executor.memoryOverhead=384

spark.yarn.historyServer.address=<IP_ADDRESS_HERE>:18080

spark.yarn.max.executor.failures=3

spark.yarn.preserve.staging.files=false

spark.yarn.queue=default

spark.yarn.scheduler.heartbeat.interval-ms=5000

spark.yarn.submit.file.replication=3


# integrate with the yarn spark history server

spark.yarn.services=org.apache.spark.deploy.yarn.history.YarnHistoryService

spark.history.provider=org.apache.spark.deploy.yarn.history.YarnHistoryProvider


Console:


         \,,,/

         (o o)

-----oOOo-(3)-oOOo-----

plugin activated: tinkerpop.server

plugin activated: tinkerpop.utilities

plugin activated: tinkerpop.credentials

INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph  - HADOOP_GREMLIN_LIBS is set to: /home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/hadoop-gremlin/plugin:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/hadoop-gremlin/lib:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/spark-gremlin/plugin:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/ext/spark-gremlin/lib:/home/nkaufman/apache-tinkerpop-gremlin-console-3.2.4/lib

plugin activated: tinkerpop.hadoop

plugin activated: tinkerpop.spark

plugin activated: tinkerpop.tinkergraph

gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-test.properties')

==>hadoopgraph[gryoinputformat->gryooutputformat]

gremlin> g = graph.traversal().withComputer(SparkGraphComputer)

==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]

gremlin> g.V().count()

WARN  org.apache.spark.SparkConf  - 

SPARK_CLASSPATH was detected (set to ':').

This is deprecated in Spark 1.0+.


Please instead use:

 - ./spark-submit with --driver-class-path to augment the driver classpath

 - spark.executor.extraClassPath to augment the executor classpath

        

WARN  org.apache.spark.SparkConf  - Setting 'spark.executor.extraClassPath' to ':' as a work-around.

WARN  org.apache.spark.SparkConf  - Setting 'spark.driver.extraClassPath' to ':' as a work-around.

ERROR org.apache.spark.SparkContext  - Error initializing SparkContext.

org.apache.spark.SparkException: Unable to load YARN support

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:399)

at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:394)

at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:394)

at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:411)

at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2118)

at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365)

at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)

at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)

at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)

at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)

at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)

at org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)

at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:143)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:348)

at org.apache.spark.util.Utils$.classForName(Utils.scala:174)

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:395)

... 18 more

org.apache.spark.SparkException: Unable to load YARN support

Type ':help' or ':h' for help.

Display stack trace? [yN]Y

java.lang.IllegalStateException: org.apache.spark.SparkException: Unable to load YARN support

at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50)

at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)

at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:184)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:237)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)

at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:447)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:191)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)

at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)

at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)

at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)

at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)

at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)

at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)

at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:169)

at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)

at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:478)

Caused by: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Unable to load YARN support

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)

... 56 more

Caused by: org.apache.spark.SparkException: Unable to load YARN support

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:399)

at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:394)

at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:394)

at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:411)

at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2118)

at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:365)

at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)

at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)

at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2281)

at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)

at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)

at org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)

at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:143)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:348)

at org.apache.spark.util.Utils$.classForName(Utils.scala:174)

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:395)

... 18 more

Jason Plurad

unread,
May 18, 2017, 2:53:38 PM5/18/17
to Gremlin-users
I made some notes on Ambari with TinkerPop last year (admittedly, I haven't done any updates since then)
https://github.com/pluradj/ambari-vagrant/blob/tp3/ubuntu14.4/tp3/SparkGraphComputer.md

The error java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil indicates that you are missing spark-assembly-*.jar on the classpath. You can get that jar from the Spark installation lib directory, copy it into a shared location in HDFS, then add a spark.yarn.jar property pointing at the HDFS location. Alternatively, you could copy that jar into ext/spark-gremlin/plugin/ and it would be included on the classpath as part of the HADOOP_GREMLIN_LIBS.

I'm not sure if TinkerPop 3.2.4 will work with Spark 1.6.3 since TP was built against Spark 1.6.1. Give it a try and let us know.

-- Jason

Nick Kaufman

unread,
May 19, 2017, 10:47:38 AM5/19/17
to Gremlin-users
Hey Jason,

Thanks for the reply. I was skeptical about TP3.2.4 with Spark 1.6.3, so I installed instead 1.6.1.
If you're interested in looking at another stack trace, I have a different error (I believe the distributed jar solved the other problem).
Also, just wanted to say thanks for your link - very detailed.

Maybe some scala version mismatch? I was under the assumption that if I could run spark shell with yarn-client then that wouldn't be the case, but I don't really know.

Thanks,
-Nick

java.lang.IllegalStateException: java.lang.NoSuchMethodError: org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.bindToYarn(Lorg/apache/hadoop/yarn/api/records/ApplicationId;Lscala/Option;)V

Caused by: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.bindToYarn(Lorg/apache/hadoop/yarn/api/records/ApplicationId;Lscala/Option;)V

Jason Plurad

unread,
May 19, 2017, 11:17:13 AM5/19/17
to Gremlin-users
Very well could be a Scala version mismatch. What Scala version are you trying to use? TinkerPop 3.2.4 builds with Spark 1.6.1 and Scala 2.10.

Nick Kaufman

unread,
May 19, 2017, 11:48:20 AM5/19/17
to Gremlin-users
According to spark-shell:

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1

      /_/


Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)


2.10.5. Rooting around I can't find a stand alone installation of scala, though. Do you know if the Scala binaries packaged with Spark are sufficient?


HadoopMarc

unread,
May 19, 2017, 2:02:36 PM5/19/17
to Gremlin-users
Hi Nick,

Getting this to work is also on my own wish list, but I have not tried yet. From previous trials with Titan, I also remember incompatibilities between the groovy versions of the spark-xxx.jar and the one packaged with TinkerPop. In this case, there might also be subtle differences between the HDP binaries and the hadoop/spark binaries built by TinkerPop. For the last issue you could try to add the HDP version to the relevant dependencies in the TinkerPop pom.xml files. It is speculative though, I do not understand what is going on.

As to your question, yes, of course Spark has packaged all its dependencies, among which any Scala binaries.

Cheers,    Marc

Op vrijdag 19 mei 2017 17:48:20 UTC+2 schreef Nick Kaufman:

john.h...@redjack.com

unread,
May 31, 2017, 1:08:23 PM5/31/17
to Gremlin-users
HadoopMarc,

Hello, I am working with Nick on this problem.  Are there relevant people who are actually compiling these TinkerPop binaries that we can contact?  Would you know the appropriate mailing list that would allow us to track this problem down, assuming that we are reaching the limits of your knowledge?   Who would you ask about this problem?

HadoopMarc

unread,
May 31, 2017, 4:42:16 PM5/31/17
to Gremlin-users
Hi John,

Good to read that you are determined to solve this!

Did you check whether your spark-1.6.1 build has the right hadoop-2.7.3 dependency? You mentioned at the start that you work on hadoop-2.7.3 (so including yarn). Do not hesitate to post full stack traces, I believe a lot of people read this and might have addtional insights (including the project devs you refer to above). See also my answer to your post on the JanusGraph list: you have to provide sufficient information so that these devs can "see" the root cause of the compatibility issues.

Cheers,    Marc

Op woensdag 31 mei 2017 19:08:23 UTC+2 schreef john.h...@redjack.com:

Jason Plurad

unread,
Jun 5, 2017, 10:06:09 AM6/5/17
to Gremlin-users
Here's a properties file that I've used successfully with a vanilla Apache Hadoop 2.7.2 install and Apache TinkerPop 3.2.4. I didn't try out Ambari yet, but generally the main thing to watch out for there is making sure the hdp.version is set all over the place as a Java option.

# conf/hadoop/hadoop-gryo.properties

# TinkerPop Hadoop Graph for OLAP
gremlin
.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
# Set the default OLAP computer for graph.traversal().withComputer()
gremlin
.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

# Gryo I/O Formats
gremlin
.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
gremlin
.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

# Gryo file location in HDFS
gremlin
.hadoop.inputLocation=tinkerpop-modern.kryo
gremlin
.hadoop.outputLocation=output

# Gremlin Console acts as the Spark Driver (YARN client)
spark
.master=yarn-client
spark
.executor.memory=512m

# When true, jars from HADOOP_GREMLIN_LIBS become added jars available to executors via http
# In TP 3.2.4/Spark 1.6.1, jars are added but don't appear to be available on executor classpath?
gremlin
.hadoop.jarsInDistributedCache=false

# Install TinkerPop on all worker nodes, then add jars with local fs path
spark
.executor.extraClassPath=/opt/apache-tinkerpop-gremlin-console-3.2.4/lib/*:/opt/apache-tinkerpop-gremlin-console-3.2.4/ext/spark-gremlin/plugin/*


In other versions I tried (3.1.6 and master), setting `gremlin.hadoop.jarsInDistributedCache=true` and HADOOP_GREMLIN_LIBS was sufficient and I didn't need to set the `spark.executor.extraClassPath`. There appears to be a possible bug (same behavior with previous TP 3.2 versions), not sure at this point whether it is TP or Spark. A workaround was to copy the dependencies over to the executor nodes and make them available with the extra classpath.


export HADOOP_PREFIX=/opt/hadoop-2.7.2
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
# Put the Hadoop configuration on the classpath so HDFS doesn't resolve to the local filesystem
export CLASSPATH=$HADOOP_CONF_DIR
# Copy the Spark assembly jar so Gremlin Console can act as the Spark driver (YarnSparkHadoopUtil)
cp
/opt/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar ext/spark-gremlin/plugin/
# Previously installed and activated tinkerpop.hadoop and tinkerpop.spark plugins
bin
/gremlin.sh

gremlin
> hdfs
==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_351892498_1, ugi=pluradj (auth:SIMPLE)]]]
gremlin
> hdfs.copyFromLocal('data/tinkerpop-modern.kryo', './')
==>null
gremlin
> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties'); g = graph.traversal().withComputer(); g.E().valueMap(true)
==>[label:created,weight:0.4,id:9]
==>[label:knows,weight:0.5,id:7]
==>[label:knows,weight:1.0,id:8]
==>[label:created,weight:1.0,id:10]
==>[label:created,weight:0.4,id:11]
==>[label:created,weight:0.2,id:12]


-- Jason

Nick Kaufman

unread,
Jun 8, 2017, 12:08:01 PM6/8/17
to Gremlin-users
Thanks for all the replies, folks. Still trying to get this to work. I've changed TP version to 3.2.3 to be consistent with JanusGraph v 0.1.1. I'm using hdp stack 2.4.2.0-258, which is packaged with Hadoop 2.7.1.2.4, and Spark 1.6.1.

It seems like I've resolved the issue with connecting to yarn, but submitting traversals still crash. To provide some context, I launch the gremlin console with this script:

export HADOOP_HOME=/usr/hdp/current/hadoop-client

export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

export YARN_HOME=/usr/hdp/current/hadoop-yarn-client

export YARN_CONF_DIR=$HADOOP_CONF_DIR

export SPARK_HOME=/usr/hdp/current/spark-client

export SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf


source
"$HADOOP_CONF_DIR"/hadoop-env.sh

source "$YARN_CONF_DIR"/yarn-env.sh

source "$SPARK_HOME"/bin/load-spark-env.sh

export JAVA_HOME=/usr/jdk64/jdk1.8.0_60/jre

export JAVA_OPTIONS="$JAVA_OPTIONS -Djava.library.path=/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dtinkerpop.ext=ext -Dlog4j.configuration=conf/log4j-console.properties -Dhdp.version=2.4.2.0-258"


JANUS_HOME=/usr/local/janusgraph-0.1.1-hadoop2

export HADOOP_GREMLIN_LIBS=$JANUS_HOME/lib


export CLASSPATH=$CLASSPATH:$HADOOP_CONF_DIR:$HADOOP_HOME/client:$YARN_HOME:$YARN_CONF_DIR:$GREMLINHOME/lib:$SPARK_HOME/lib:$SPARK_CONF_DIR


cd $JANUS_HOME

exec bin/gremlin.sh


and using the following properties file:

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph

gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat

gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=false


gremlin.hadoop.inputLocation=hdfs:///user/nkaufman/data/tinkerpop-modern.kryo

gremlin.hadoop.outputLocation=output


####################################

# SparkGraphComputer Configuration #

####################################


spark
.master=yarn-client

spark.executor.memory=1g

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.executor.extraClassPath=/usr/local/janusgraph-0.1.1-hadoop2/lib

# Additional things...

spark.yarn.appMasterEnv.JAVA_HOME=/usr/jdk64/jdk1.8.0_60/jre

spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

spark.yarn.appMasterEnv.SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

spark.yarn.appMasterEnv.CLASSPATH=$CLASSPATH:/usr/hdp/current/hadoop-mapreduce-client:/usr/hdp/current/hadoop-mapreduce-client/lib

spark.yarn.jar=hdfs:///user/nkaufman/share/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar

spark.yarn.am.extraJavaOptions=-Dhdp.version=2.4.2.0-258 -Djava.library.path=/usr/hdp/2.4.2.0-258/hadoop/lib/native


spark.executorEnv.JAVA_HOME=/usr/jdk64/jdk1.8.0_60/jre

spark.executorEnv.HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf

spark.executorEnv.SPARK_CONF_DIR=/usr/hdp/current/spark-client/conf

spark.executor.extraJavaOptions=-Dhdp.version=2.4.2.0-258



Here is the stack trace for the error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, uzzerg.nicc.noblis.org): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
Type ':help' or ':h' for help.
Display stack trace? [yN]11:12:14 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator  - Remoting shut down.
y
java.lang.IllegalStateException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, uzzerg.nicc.noblis.org): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:179)
at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1024)
at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:446)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:190)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1215)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1215)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1215)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:152)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:455)
Caused by: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, uzzerg.nicc.noblis.org): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)
... 54 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, uzzerg.nicc.noblis.org): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1855)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1868)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1881)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:925)
at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:225)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartition(JavaRDDLike.scala:46)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.executeVertexProgramIteration(SparkExecutor.java:173)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:280)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$330 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
... 3 more



Thanks in advance. This group has been very helpful.


Edi Bice

unread,
Jun 19, 2017, 5:02:49 PM6/19/17
to Gremlin-users
Nick,

I get the same error using JanusGraph 0.1.1 on HDP 2.5 with Spark 1.6.2 and Hadoop 2.7.3.

How did you get past it? Downgrading to Spark to 1.6.1?

Edi


On Friday, May 19, 2017 at 10:47:38 AM UTC-4, Nick Kaufman wrote:

Nick Kaufman

unread,
Jun 19, 2017, 6:56:37 PM6/19/17
to Gremlin-users
Hey Edi,

Just today I managed to have success with this. I plan on posting a more detailed solution, including my configuration files at some point this week. There was never an "aha" moment - more just a particular combination of what a few (Jason, Marc) others have mentioned.

But yes, I ended up having success with HDP 2.4.2, Spark 1.6.1, Hadoop 2.7.1.

Antriksh Shah

unread,
Jun 9, 2018, 10:40:36 AM6/9/18
to Gremlin-users
Hey Nick,

Di you remember what settings you used to set gremlin to use yarn?
Reply all
Reply to author
Forward
0 new messages