Spark + Druid Tranquility - library version conflict

874 views
Skip to first unread message

Ashish Awasthi

unread,
Dec 23, 2015, 3:59:00 AM12/23/15
to Druid User
I've posted an issue I'm facing with using Spark and Tranquility to:  http://stackoverflow.com/questions/34431329/spark-druid-tranquility-library-version-conflict

I'm writing here, just to get the attention of Druid Users, don't want to duplicate the post here.

I wonder if Tranquility-Spark works with standard spark build. 

Any pointers to resolve the conflict?


Gian Merlino

unread,
Dec 28, 2015, 5:32:50 PM12/28/15
to druid...@googlegroups.com
Hey Ashish,

I'm guessing this is due to mixing different versions of jackson-databind with jackson-datatype-joda (possibly 2.6.1 of joda with an older databind). Would it work for you to include the same version of both jackson jars on Spark's classpath? I *think* you shouldn't have to recompile Spark- just get it to load the newer jacksons.

If not then we can possibly bundle tranquility-spark specifically with an older version of jackson.

Either way, if you could update this thread or https://github.com/druid-io/tranquility/issues/76 with whether you do end up finding a workaround, that would be super helpful.

Thanks!

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/1333f559-c887-47fd-a288-0fb2f1e8b004%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ashish Awasthi

unread,
Dec 28, 2015, 11:42:50 PM12/28/15
to druid...@googlegroups.com
Thanks Gian for your response. 

I have both versions of jackson jars in classpath and spark is picking up the old one, which is bundled with spark and not the newer, which is bundled with my application jar.
I have received an answer on how to force the newer one at http://stackoverflow.com/questions/34431329/spark-druid-tranquility-library-version-conflict 
I'm going to try that when I'm back from vacations next week. I will update the post once I've tried that.

Thanks again
Ashish

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/R7RZ30iq8Vw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

charles.allen

unread,
Jan 4, 2016, 3:10:12 PM1/4/16
to Druid User
Upgrading jackson tends to break things from the druid-hadoop side, which is why we haven't done it in the project. I'd like to know how spark manages to have an updated jackson version without breaking hadoop.

I had very similar issues with https://github.com/metamx/druid-spark-batch


On Monday, December 28, 2015 at 8:42:50 PM UTC-8, Ashish Awasthi wrote:
Thanks Gian for your response. 

I have both versions of jackson jars in classpath and spark is picking up the old one, which is bundled with spark and not the newer, which is bundled with my application jar.
I have received an answer on how to force the newer one at http://stackoverflow.com/questions/34431329/spark-druid-tranquility-library-version-conflict 
I'm going to try that when I'm back from vacations next week. I will update the post once I've tried that.

Thanks again
Ashish
On Tue, Dec 29, 2015 at 4:02 AM, Gian Merlino <gi...@imply.io> wrote:
Hey Ashish,

I'm guessing this is due to mixing different versions of jackson-databind with jackson-datatype-joda (possibly 2.6.1 of joda with an older databind). Would it work for you to include the same version of both jackson jars on Spark's classpath? I *think* you shouldn't have to recompile Spark- just get it to load the newer jacksons.

If not then we can possibly bundle tranquility-spark specifically with an older version of jackson.

Either way, if you could update this thread or https://github.com/druid-io/tranquility/issues/76 with whether you do end up finding a workaround, that would be super helpful.

Thanks!

Gian

On Wed, Dec 23, 2015 at 3:59 AM, Ashish Awasthi <ashish....@gmail.com> wrote:
I've posted an issue I'm facing with using Spark and Tranquility to:  http://stackoverflow.com/questions/34431329/spark-druid-tranquility-library-version-conflict

I'm writing here, just to get the attention of Druid Users, don't want to duplicate the post here.

I wonder if Tranquility-Spark works with standard spark build. 

Any pointers to resolve the conflict?


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/1333f559-c887-47fd-a288-0fb2f1e8b004%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/R7RZ30iq8Vw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

Charles Chao

unread,
Jan 6, 2016, 1:07:56 PM1/6/16
to Druid User
Hi, All,

I had the same problem when submitting spark job with tranquility to CDH 5.4, but the older version of databind.jar is from CDH: /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop-mapreduce/jackson-databind-2.2.3.jar

The options listed in the stackoverflow link below didn't help in my case. I plan to build a metamx scala-utils.jar with the older version of databind.jar (obviously need to modify some code, hopefully not too much)...does anyone has any other suggestions on this?

Thanks,

Charles Chao

On Monday, December 28, 2015 at 8:42:50 PM UTC-8, Ashish Awasthi wrote:

Gian Merlino

unread,
Jan 7, 2016, 1:49:18 AM1/7/16
to druid...@googlegroups.com
Hey Charles,

Jackson is *usually* pretty good about not breaking backwards compatibility, so if you can figure out how to get the newer versions loaded on CDH that would probably work best. Otherwise, yeah, probably try building tranquility with an older Jackson and see if it works. If you do find a solution I would really appreciate if you took the time to post it here.

Gian

Gian Merlino

unread,
Jan 7, 2016, 1:49:40 AM1/7/16
to druid...@googlegroups.com
IIRC CDH has an option that does something like user-classpath-goes-first; maybe that'd help?

Gian

Charles Chao

unread,
Jan 7, 2016, 2:16:50 PM1/7/16
to Druid User
Hi, Gian, 

Thanks for your suggestions. Actually I tried the spark experimental "user-classpath-first" before, but it didn't help. As of the CDH, since it's our prod env with many jobs running, I prefer not to change its configuration yet.

I tried building the metamx scala-util with an older version of Jackson datatype jar, and I'm getting a new error. I have pasted the error message below...does anyone has some insights about this? 

Thanks,

Charles Chao

Caused by: com.google.inject.CreationException: Guice creation errors:

1) An exception was caught and reported. Message: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
  at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)

2) No implementation for javax.validation.Validator was bound.
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)

2 errors
	at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
	at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
	at com.google.inject.Guice.createInjector(Guice.java:95)
	at com.google.inject.Guice.createInjector(Guice.java:72)
	at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:57)
	at com.metamx.tranquility.druid.DruidGuicer$.<init>(DruidGuicer.scala:39)
	at com.metamx.tranquility.druid.DruidGuicer$.<clinit>(DruidGuicer.scala)
	... 17 more
Caused by: javax.validation.ValidationException: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
	at javax.validation.Validation$GenericBootstrapImpl.configure(Validation.java:271)
	at javax.validation.Validation.buildDefaultValidatorFactory(Validation.java:110)
	at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)
	at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
	at com.google.inject.spi.Elements.getElements(Elements.java:101)
	at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)

Charles Chao

unread,
Jan 7, 2016, 3:01:44 PM1/7/16
to Druid User
This error happened on driver, before it can even start any tasks. Switching back to datatype 2.6.1, this problem goes away, tasks can be started, and then failed due to that original "NoSuchFieldError" on each executors. So apparently there's other dependencies on version 2.6.1, I cannot simply use a lower version.

At this time, I have run out of options. Maybe the only thing left is to upgrade the jar file for CDH, but that's not too realistic to me. 

I've searched but it appears that Tranquility is the only option to write to Druid from Spark/SparkStreaming. It would be really disappointing if I have to change the streaming pipeline just because of this issue. Any suggestion is welcome.

Thanks, 

Charles

Gian Merlino

unread,
Jan 7, 2016, 6:59:09 PM1/7/16
to druid...@googlegroups.com
You might be able to work around the conflict by deploying your spark job with a self-contained jar that uses relocated classes. This should allow both versions of jackson to exist, one used by spark internals and one used by your code. You can do that with the maven shade plugin: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

Another option is rebuilding tranquillity to use a different version of jackson- I'm not sure which one would work, but there might be one out there that will. If anyone ever figures that out it would be super helpful to hear what works, since we could make that change in the official tranquility-spark.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

Charles Chao

unread,
Jan 7, 2016, 7:33:16 PM1/7/16
to Druid User
I have tried rebuilding with the older version of dependency jar yesterday, also set up user classpath first (this is only tested on local mode for now), both time I got this error message...any insights on this? I'm not sure if this error message means I actually moved one step ahead.

I did try to add hibernate-validator in my dependencies but that didn't help.

I have these druid and tranquility related dependencies in my project:

"io.druid" % "druid" % "0.7.3",
"io.druid" % "druid-processing" % "0.7.3",
"io.druid" % "tranquility-core_2.10" % "0.6.4",
"io.druid" % "tranquility-spark_2.10" % "0.6.4",
"org.hibernate" % "hibernate-validator" % "4.2.0.Final",
"org.hibernate" % "hibernate-validator-annotation-processor" % "4.1.0.Final"

Thanks, 

Charles Chao

============ error messsage ===================

16/01/07 16:04:54 INFO Guice: An exception was caught and reported. Message: javax.validation.ValidationException: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
javax.validation.ValidationException: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
at javax.validation.Validation$GenericBootstrapImpl.configure(Validation.java:271)
at javax.validation.Validation.buildDefaultValidatorFactory(Validation.java:110)
at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)
at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:230)
at com.google.inject.spi.Elements.getElements(Elements.java:103)
at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:136)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
at com.google.inject.Guice.createInjector(Guice.java:96)
at com.google.inject.Guice.createInjector(Guice.java:73)
at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:57)
at com.metamx.tranquility.druid.DruidGuicer$.<init>(DruidGuicer.scala:39)
at com.metamx.tranquility.druid.DruidGuicer$.<clinit>(DruidGuicer.scala)
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig$$anon$6.<init>(DruidBeams.scala:261)
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig.buildAll(DruidBeams.scala:259)
at com.metamx.tranquility.druid.DruidBeams$Builder.buildBeam(DruidBeams.scala:182)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam$lzycompute(MapBeamFactory.scala:47)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam(MapBeamFactory.scala:22)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:37)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:36)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/01/07 16:04:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoClassDefFoundError: Could not initialize class com.metamx.tranquility.druid.DruidGuicer$
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig$$anon$6.<init>(DruidBeams.scala:261)
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig.buildAll(DruidBeams.scala:259)
at com.metamx.tranquility.druid.DruidBeams$Builder.buildBeam(DruidBeams.scala:182)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam$lzycompute(MapBeamFactory.scala:47)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam(MapBeamFactory.scala:22)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:37)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:36)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/01/07 16:04:54 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.ExceptionInInitializerError
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig$$anon$6.<init>(DruidBeams.scala:261)
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig.buildAll(DruidBeams.scala:259)
at com.metamx.tranquility.druid.DruidBeams$Builder.buildBeam(DruidBeams.scala:182)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam$lzycompute(MapBeamFactory.scala:47)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam(MapBeamFactory.scala:22)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:37)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:36)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.inject.CreationException: Guice creation errors:

1) An exception was caught and reported. Message: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
  at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:136)

2) No implementation for javax.validation.Validator was bound.
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)

2 errors
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:448)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:96)
at com.google.inject.Guice.createInjector(Guice.java:73)
at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:57)
at com.metamx.tranquility.druid.DruidGuicer$.<init>(DruidGuicer.scala:39)
at com.metamx.tranquility.druid.DruidGuicer$.<clinit>(DruidGuicer.scala)
... 17 more
Caused by: javax.validation.ValidationException: Unable to create a Configuration, because no Bean Validation provider could be found. Add a provider like Hibernate Validator (RI) to your classpath.
at javax.validation.Validation$GenericBootstrapImpl.configure(Validation.java:271)
at javax.validation.Validation.buildDefaultValidatorFactory(Validation.java:110)
at io.druid.guice.ConfigModule.configure(ConfigModule.java:37)
at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:230)
at com.google.inject.spi.Elements.getElements(Elements.java:103)
at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:136)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
... 22 more
16/01/07 16:04:54 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.NoClassDefFoundError: Could not initialize class com.metamx.tranquility.druid.DruidGuicer$
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig$$anon$6.<init>(DruidBeams.scala:261)
at com.metamx.tranquility.druid.DruidBeams$BuilderConfig.buildAll(DruidBeams.scala:259)
at com.metamx.tranquility.druid.DruidBeams$Builder.buildBeam(DruidBeams.scala:182)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam$lzycompute(MapBeamFactory.scala:47)
at com.hulu.metrics.streaming.MapBeamFactory.makeBeam(MapBeamFactory.scala:22)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:37)
at com.metamx.tranquility.spark.BeamRDD$$anonfun$propagate$1.apply(BeamRDD.scala:36)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:903)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1935)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Gian Merlino

unread,
Jan 8, 2016, 2:28:42 PM1/8/16
to druid...@googlegroups.com
Hey folks,

Next version of tranquility will have a downgraded jackson to match the version used by Druid: https://github.com/druid-io/tranquility/pull/81. Hopefully that fixes these problems.

If anyone could build from master and try that out in their environment, that would be incredibly helpful. The easiest way is probably to run "sbt +publishM2" to publish to your local maven repository, or "sbt +publish-local" to publish to your local ivy repository.

Gian

Charles Chao

unread,
Jan 11, 2016, 10:46:30 AM1/11/16
to Druid User
Thanks Gian. 

This pull request has only one change:

-val jacksonTwoVersion = "2.6.3"
+val jacksonTwoVersion = "2.4.6"

However I think scala-util still has dependency with jackson version 2.6.

Charles

Gian Merlino

unread,
Jan 11, 2016, 5:59:21 PM1/11/16
to druid...@googlegroups.com
I think that isn't not using anything that is in 2.6.3 but not 2.4.6, so it should be enough to override it here. At least that's the idea.

Gian

Eungsop Yoo

unread,
Dec 21, 2016, 3:40:56 AM12/21/16
to Druid User
I had the same issue. The assembled jar that I made did not contain hibernate-validator. I tried to add it to the jar but I couldn't.
Instead, I added --jars option for hibernate-validator.jar to spark-submit and it worked.

2016년 1월 8일 금요일 오전 9시 33분 16초 UTC+9, Charles Chao 님의 말:
Reply all
Reply to author
Forward
0 new messages