SBT build tips for Scala Spark apps

2,782 views
Skip to first unread message

Evan Chan

unread,
Apr 24, 2013, 8:31:03 PM4/24/13
to spark...@googlegroups.com
Let's say you are building a Scala Spark app, and probably using SBT.  You want to do the following:
- Be able to run the Spark app in local mode, using "sbt run" for quick dev cycles
- Be able to run unit tests using "sbt test"
- Create fat jars for Spark using "sbt assembly".

It turns out it's very tricky to do all three, because "sbt run" requires Spark on the class path, but building a fat jar with Assembly that works with Spark really requires you to leave out the scala library, and you probably want to leave out Spark as well (it pulls in a huge # of deps).

But, it is possible!   Thanks to some help from the SBT forums, I now have the following:

  lazy val sparkAssemblySettings = Assembly.settings ++ sbtRunSettings ++ Seq(
    libraryDependencies ++= sparkDeps,
    // Redefine just the run task to use Compile deps so spark will launch correctly
    run in Compile <<= Defaults.runTask(fullClasspath in Compile,
                                        mainClass in (Compile, run), runner in (Compile, run)),
    runMain in Compile <<= Defaults.runMainTask(fullClasspath in Compile,
                                                runner in (Compile, runMain)),
    assembleArtifact in packageScala := false   // scala-library causes problems for Spark
  )

where sparkDeps is like:

  lazy val sparkDeps = Seq(
    "org.spark-project" %% "spark-core" % "0.7.0" % "provided"
  )


The "provided" allows a clean way to build a fat jar without pulling in Spark, while the "run in Compile" and "runMain in Compile" redefines the run tasks so that they use the classPath that includes spark.  Voila!

-Evan

Hai-Anh Trinh

unread,
Apr 25, 2013, 1:08:46 AM4/25/13
to spark...@googlegroups.com
Evan,

Thanks for sharing. We've found to utilize spark/run script which build the CLASSPATH with Spark's dependencies, hence no need to build a fat jar.



-Evan

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Hai-Anh Trinh | Senior Software Engineer | ADATAO Inc.

Evan Chan

unread,
Apr 25, 2013, 1:59:34 AM4/25/13
to spark...@googlegroups.com
There are two disadvantages to the spark/run script:
1. Every developer has to download and build spark.
2. Much longer development cycle (package a jar, then use ./run script)

From SBT you can do ~run, triggered execution, for nearly instant turnaround.


--
You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/pHaF01sPwBo/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
--
Evan Chan
Senior Software Engineer | 
e...@ooyala.com | (650) 996-4600
www.ooyala.com | blog | @ooyala

Ian O'Connell

unread,
Apr 25, 2013, 6:50:39 PM4/25/13
to spark...@googlegroups.com
Thanks for this, dropping the code in quite didn't work in my setup so figured I'd just post a sample build.sbt for anyone else. This will let sbt run work and sbt assembly be smaller


import AssemblyKeys._ // put this at the top of the file



name := "Example Project"


version := "0.0.1"


scalaVersion := "2.9.2"


resolvers ++= Seq(

  "Akka Repository" at "http://repo.akka.io/releases/",

  "Spray Repository" at "http://repo.spray.cc/",

  "snapshots" at "http://oss.sonatype.org/content/repositories/snapshots",

  "releases" at "http://oss.sonatype.org/content/repositories/releases"

  )



libraryDependencies ++= {

  Seq(

    "org.scalatest" %% "scalatest" % "1.9.1" % "test",

    "org.spark-project" %% "spark-core" % "0.7.0" % "provided",

    "com.typesafe" % "config" % "0.3.1",

    "org.json4s" %% "json4s-native" % "3.2.2"

  )

}


runMain in Compile <<= Defaults.runMainTask(fullClasspath in Compile, runner in (Compile, run))


run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)) 


assemblySettings



assembleArtifact in packageScala := false


mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>

  {

    case PathList("org", "w3c", xs @ _*) => MergeStrategy.first

    case "about.html"     => MergeStrategy.discard

    case "log4j.properties"     => MergeStrategy.concat

    case x => old(x)

  }

}


test in assembly := {}



Where project/plugins.sbt contains:


addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.8.8")


and using sbt 0.12.3

Evan chan

unread,
Apr 25, 2013, 7:41:12 PM4/25/13
to spark...@googlegroups.com, spark...@googlegroups.com
Oh yes, sorry the original was for .scala build files. 

-Evan
Carry your candle, run to the darkness
Seek out the helpless, deceived and poor
Hold out your candle for all to see it
Take your candle, and go light your world
 

--

Evan chan

unread,
Apr 25, 2013, 7:42:09 PM4/25/13
to spark...@googlegroups.com
Btw if you guys think this will help other developers id be happy to put in a pull request for docs.  Not sure how to do that. 


-Evan
Carry your candle, run to the darkness
Seek out the helpless, deceived and poor
Hold out your candle for all to see it
Take your candle, and go light your world
 


On Apr 25, 2013, at 3:50 PM, Ian O'Connell <ianoc...@gmail.com> wrote:

--
Reply all
Reply to author
Forward
0 new messages