sbt assembly, cannot resolve duplication error

478 views
Skip to first unread message

Thomas Brandl

unread,
May 20, 2016, 12:56:22 PM5/20/16
to Time Series for Spark (the spark-ts package)
I am trying to build a fat jar with this build.sbt, where the spark-ts-with-depencies.jar lives in the /lib folder. Here's my build.sbt


name := "ConfidenceAnalysis"


version
:= "1.0"


scalaVersion
:= "2.10.5"


assemblyMergeStrategy
in assembly := {
   
case x if x.startsWith("META-INF") => MergeStrategy.discard
   
case PathList("org", "scalanlp", xs @ _*) => MergeStrategy.first // not making a difference
   
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
   
case PathList("org", "objectweb", xs @ _*) => MergeStrategy.last
   
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
   
case "plugin.properties" => MergeStrategy.last

   
case "log4j.properties" => MergeStrategy.first
   
case x =>
        val oldStrategy
= (assemblyMergeStrategy in assembly).value
        oldStrategy
(x)
}


excludedJars
in assembly <<= (fullClasspath in assembly) map { cp =>
 cp filter
{x => x.data.getName.matches("sbt.*") || x.data.getName.matches(".*macros.*")}
}


libraryDependencies
+= "org.apache.spark" %% "spark-core" % "1.6.1"// % "provided"
libraryDependencies
+= "org.apache.spark" %% "spark-sql" % "1.6.1"// % "provided"
libraryDependencies
+= "org.apache.spark" %% "spark-mllib" % "1.6.1"// % "provided"
libraryDependencies
+= "com.databricks" % "spark-csv_2.10" % "1.4.0"

libraryDependencies
+= "com.esotericsoftware" % "kryo" % "3.0.3"

libraryDependencies
+= "org.apache.commons" % "commons-math3" % "3.3"




Error is:
 

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/thomasbrandl/Dev/spark-timeseries/target/sparkts-0.3.0-jar-with-dependencies.jar:breeze/features/FeatureVector.class
[error] /Users/thomasbrandl/.ivy2/cache/org.scalanlp/breeze_2.10/jars/breeze_2.10-0.11.2.jar:breeze/features/FeatureVector.class

I have tried to remove the breeze dependency in spark-ts's pom.xml, which helped to this particular error, but is generally not  a great idea I guess. The list of clashes continues with other files, so this led nowhere, e.g.:


/Users/thomasbrandl/Dev/spark-timeseries/target/sparkts-0.3.0-jar-with-dependencies.jar:ch/epfl/lamp/compiler/msil/emit/AssemblyBuilder.class
/Users/thomasbrandl/.ivy2/cache/org.scala-lang/scala-compiler/jars/scala-compiler-2.10.0.jar:ch/epfl/lamp/compiler/msil/emit/AssemblyBuilder.class


But clashes w/ scala-compiler?


Also I have tried to include the no-dependencies spark-ts.jar, but this led to a terrifying list of errors upon sbt assembly, I refrained from going down that path. With its 525k in size, it seems to be the better candidate, though.


Any suggestions how I should move on from here?


Thanks!

t



tho...@gmail.com

unread,
May 23, 2016, 2:45:08 PM5/23/16
to Time Series for Spark (the spark-ts package)
I sorted this out. Spark libraries need to be specified as "provided" and there is no need or use to create a fat jar that includes it. I was trying to run this assembly with $ java -cp [jar] instead of running it through spark-submit. There is no need at all however to not use spark-submit.
Reply all
Reply to author
Forward
0 new messages