guava version issue when using Spark job to write to JanusGraph with Hbase configured as backend

Yifeng Liu

unread,

Mar 11, 2019, 9:55:49 AM3/11/19

to JanusGraph users

Hi folks,

My team is building a knowledge graph pipeline with JanusGraph. Due to the fact that the team already got a few years experience with Hbase, we'd love to have Hbase as the JanusGraph backend.

Here is the cluster setup:

- OS: Ubuntu 16.04.5 LTS/Xenial Xerus

- JanusGraph version: 0.3.1

- Hbase cluster version: 2.1.1

- Spark version: 2.3.1

When submitting the spark job with an uber jar, the job fails with error message:

java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;

at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.<init>(StandardIDPool.java:269)

at org.janusgraph.graphdb.database.idassigner.StandardIDPool.startIDBlockGetter(StandardIDPool.java:251)

at org.janusgraph.graphdb.database.idassigner.StandardIDPool.nextBlock(StandardIDPool.java:178)

at org.janusgraph.graphdb.database.idassigner.StandardIDPool.nextID(StandardIDPool.java:208)

at org.janusgraph.graphdb.database.idassigner.VertexIDAssigner.assignID(VertexIDAssigner.java:333)

at org.janusgraph.graphdb.database.idassigner.VertexIDAssigner.assignID(VertexIDAssigner.java:182)

at org.janusgraph.graphdb.database.idassigner.VertexIDAssigner.assignID(VertexIDAssigner.java:153)

at org.janusgraph.graphdb.database.StandardJanusGraph.assignID(StandardJanusGraph.java:460)

at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:514)

at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:532)

at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:528)

at com.snafu.BulkLoader.addVertex(BulkLoader.java:189)

at com.snafu.BulkLoader.bulkLoad(BulkLoader.java:130)

at com.snafu.SparkBulkLoader.lambda$main$1282d8df$1(SparkBulkLoader.java:33)

at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:109)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

I've been googling around online and found a few posts that seem to me they are the same problem: both Spark and Hbase use older guava version that JanusGraph does.

Stack Overflow post seems to be a promising approach but I don't really want to rebuild JanusGraph to shade Guava with a bunch of other chain reaction-ish tweaks on the versions.

Issue 488 seems to be close to the issue I am having right now but I am not 100% sure.

Any helps will be super appreciated.

Cheers,

~Yifeng

HadoopMarc

unread,

Mar 11, 2019, 2:19:54 PM3/11/19

to JanusGraph users

Hi Yifeng,

The good news: the janusgraph team has already done the shading for you, but they put it the janusgraph-hbase jar.

If you want to profit from it, stop using spark-submit and the uber jar and configure your spark aplication like described in the links provided in another running thread on janusgrap-hbase with spark:

https://groups.google.com/forum/#!topic/janusgraph-users/8K0bzF5HDso

Let the error messages guide you and if they are hard to interpret, feel free to come back to this thread.

Cheers, Marc

Op maandag 11 maart 2019 14:55:49 UTC+1 schreef Yifeng Liu:

Yifeng Liu

unread,

Mar 21, 2019, 10:56:50 PM3/21/19

to JanusGraph users

Hi Marc,

Thanks for you reply.

I went through the post and your blogs thoroughly. Unfortunately the infrastructure is maintained by another team so that some of the setup mentioned in your blog is pretty though to make it happen, and even if possible, it takes forever via a ticketing system.

For bulk loading part, we still stick to running embedded JanusGraph on top of a Spark job with Hbase configured as backend. Somehow we came up with an aggressive shading to all com.google.* package that resolves the Guava version issue.

Such shading might backfire on us one day but for now it certainly unblocks us and we can continue the work.

The following is an example of the shade plugin we are using:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>${maven.shade.plugin.version}</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass>com.foo.bar.JanusGraphBulkLoadOnSpark</mainClass>
                    </transformer>
                </transformers>
                <filters>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>META-INF/maven/**</exclude>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                        </excludes>
                    </filter>
                </filters>
                <relocations>
                    <relocation>
                        <pattern>com</pattern>
                        <shadedPattern>repackaged.com</shadedPattern>
                        <includes>
                            <include>com.google.common.**</include>
                        </includes>
                    </relocation>
                </relocations>
            </configuration>
        </execution>
    </executions>

</plugin>

HadoopMarc

unread,

Mar 24, 2019, 10:34:54 AM3/24/19

to JanusGraph users

Hi Yifeng,

Thanks for posting back your alternative approach. I will certainly keep it in mind for when I get stuck!

Cheers, Marc

Op vrijdag 22 maart 2019 03:56:50 UTC+1 schreef Yifeng Liu:

Ryan Stauffer

unread,

May 14, 2019, 2:47:36 PM5/14/19

to JanusGraph users

Yifeng,

+1 on this alternative approach!

Here's an additional Google reference on managing Spark dependencies through shading:

https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc

Thanks,

Ryan

Yash Datta

unread,

Jun 14, 2020, 9:55:33 AM6/14/20

to JanusGraph users

Hello Yifeng!

Thanks for this, I encountered the same problem and your post saved my day.

If you do not mind, could you take a look at my approach of loading data into janusgraph using spark and let me know if there are any issues with the approach?

https://github.com/astrolabsoftware/grafink/blob/master/docs/LoadAlgorithm.md

PS: This is an open source project as part of gsoc 2020 that I am working on.

Thanks and Best Regards

Yash

Reply all

Reply to author

Forward