Spark Graph Computer with Tinkerpop 3.3.0 + CosmosDB

390 views
Skip to first unread message

Devang Patel

unread,
Nov 22, 2017, 3:24:24 PM11/22/17
to Gremlin-users
Hi Guys,

I am trying to use SparkGraphComputer with Tinkerpop 3.3.0 to run gremlin queries over Azure CosmosDB.
While running my application I am facing the following error:

Exception in thread "main" java.lang.IllegalStateException: java.lang.IllegalStateException: Unable to load KryoShimService
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:50)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:200)
at Main$.main(Main.scala:82)

I was searching for some reference over google and stack overflow but couldn't find anything helpful.

Here is my SparkGraphComputer configuration:

  //####################################
      //# SparkGraphComputer Configuration #
      //####################################
      conf.setProperty("spark.master", "yarn")
      conf.setProperty("spark.executor.memory", "1g")
      conf.setProperty("spark.executor.instances", "1")
      conf.setProperty("spark.serializer" , "org.apache.spark.serializer.KryoSerializer")
      conf.setProperty("spark.kryo.registrator", "org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator")
      conf.setProperty("gremlin.spark.persistContext", "true")


Can any of you suggest how to debug this problem? 

Thanks,
Devang.

HadoopMarc

unread,
Nov 23, 2017, 1:31:52 AM11/23/17
to Gremlin-users
Hi Devang,

You can start with these blogs to get some inspiration for improving your configs:

http://yaaics.blogspot.nl/2017/06/configuring-apache-tinkerpop-for-spark.html
http://yaaics.blogspot.nl/2017/07/configuring-janusgraph-for-spark-yarn.html

Cheers,   Marc

Op woensdag 22 november 2017 21:24:24 UTC+1 schreef Devang Patel:

Devang Patel

unread,
Nov 24, 2017, 1:07:21 PM11/24/17
to Gremlin-users
Thanks for the links Marc. But it wasn't helpful with my current problem.

HadoopMarc

unread,
Nov 24, 2017, 3:58:00 PM11/24/17
to Gremlin-users
Hi Devang,

Yes, TinkerPop on CosmosDB is new to me, I already wondered how you could use this with TinkerPop...

So, after some reading I found this. The following Microsoft link provides additional configurations for SparkGraphComputer on CosmosDB (in particular classpaths configs for the driver and executor) :
https://docs.microsoft.com/en-us/azure/cosmos-db/spark-connector-graph

Also note that the writer of this documentation was a bit confused about all the graph technology involved because TinkerPop's SparkGraphComputer does not use Apache Spark Graphx at all!

Cheers,     Marc


Op vrijdag 24 november 2017 19:07:21 UTC+1 schreef Devang Patel:

Devang Patel

unread,
Nov 27, 2017, 6:27:06 PM11/27/17
to Gremlin-users
Hi Marc,

Yes, I am following the same document but instead of going to the gremlin console, I am creating it as a java jar file and then run it using spark-submit.
I am packaging all the necessary dependencies by creating a fat jar for my application. 

While, running that jar file using spark-submit i am getting that error of "Unable to load KryoShimService."

I am not much familiar with what is this service doing and what does this error message implicates. I need some help in order to understand and debug this problem.

HadoopMarc

unread,
Nov 28, 2017, 2:57:35 PM11/28/17
to Gremlin-users
Hi Devang,

KryoShimService is here:

https://github.com/apache/tinkerpop/blob/3.2.6/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/gryo/kryoshim/KryoShimService.java

Is gremlin-core with KryoShimService in your uberjar?  If not, add it as a dependency. If it is, please attach your pom.xml so we can see what is there and can sit in the way.

You might also want to take a look at:
https://mvnrepository.com/artifact/org.apache.tinkerpop/gremlin-archetype
https://maven.apache.org/guides/introduction/introduction-to-archetypes.html

Cheers,     Marc

Op dinsdag 28 november 2017 00:27:06 UTC+1 schreef Devang Patel:

Devang Patel

unread,
Dec 5, 2017, 11:47:37 AM12/5/17
to Gremlin-users
Hi Marc,

Yes, I do have "gremlin-core" version 3.3.0 in my uber jar. I am using sbt to build my uber jar.

Here is my build.sbt file.


name := "cosmos_graph_spark"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.tinkerpop" % "spark-gremlin" % "3.3.0"
libraryDependencies += "org.apache.tinkerpop" % "gremlin-core" % "3.3.0"
//libraryDependencies += "org.apache.tinkerpop" % "hadoop-gremlin" % "3.3.0"
libraryDependencies += "com.microsoft.azure" % "azure-documentdb" % "1.13.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" % "provided"

assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}

prasad.d...@gmail.com

unread,
Jan 30, 2018, 6:12:48 PM1/30/18
to Gremlin-users
Hi Devang Patel,

I am also facing same issue,
If you fixed it , please let me know how you solved it ?

Thanks,
Prasad Dokuparthi.

HadoopMarc

unread,
Jan 31, 2018, 4:38:16 AM1/31/18
to Gremlin-users
Hi Prasad,

I would use something like the maven enforcer plugin to test what version conflicts exist between the azure-documentdb and the TinkerPop jars and their transitive dependencies. If there are few conflicts you can just exclude the lower version in your project file. If there are many and you are lazy, you select the most promising merging conflicts and start excluding from there.

HTH,     Marc

Op woensdag 31 januari 2018 00:12:48 UTC+1 schreef prasad.d...@gmail.com:

prasad.d...@gmail.com

unread,
Jan 31, 2018, 3:37:48 PM1/31/18
to Gremlin-users
Hi Marc,

Thanks a lot for coming to rescue me :-)

I added below in my config file and it worked:

spark.serializer=org.apache.spark.serializer.KryoSerializer

spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator


I followed your idea to look in https://docs.microsoft.com/en-us/azure/cosmos-db/spark-connector-graph

Thanks you ....... :-)

Regards,
Prasad Dokuparthi.
Reply all
Reply to author
Forward
0 new messages