Discussion: Consolidating Spark's build system

Matei Zaharia

unread,

Jul 15, 2013, 8:41:31 PM7/15/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

Hi all,

I wanted to bring up a topic that there isn't a 100% perfect solution for, but that's been bothering the team at Berkeley for a while: consolidating Spark's build system. Right now we have two build systems, Maven and SBT, that need to be maintained together on each change. We added Maven a while back to try it as an alternative to SBT and to get some better publishing options, like Debian packages and classifiers, but we've found that 1) SBT has actually been fairly stable since then (unlike the rapid release cycle before) and 2) classifiers don't actually seem to work for publishing versions of Spark with different dependencies (you need to give them different artifact names). More importantly though, because maintaining two systems is confusing, it would be good to converge to just one soon, or to find a better way of maintaining the builds.

In terms of which system to go for, neither is perfect, but I think many of us are leaning toward SBT, because it's noticeably faster and it has less code to maintain. If we do this, however, I'd really like to understand the use cases for Maven, and make sure that either we can support them in SBT or we can do them externally. Can people say a bit about that? The ones I've thought of are the following:

- Debian packaging -- this is certainly nice, but there are some plugins for SBT too so may be possible to migrate.
- BigTop integration; I'm not sure how much this relies on Maven but Cos has been using it.
- Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't really work if you want to publish to Maven Central; you still need two artifact names because the artifacts have different dependencies. However, more importantly, we'd like to make Spark work with all Hadoop versions by using hadoop-client and a bit of reflection, similar to how projects like Parquet handle this.

Are there other things I'm missing here, or other ways to handle this problem that I'm missing? For example, one possibility would be to keep the Maven build scripts in a separate repo managed by the people who want to use them, or to have some dedicated maintainers for them. But because this is often an issue, I do think it would be simpler for the project to have one build system in the long term. In either case though, we will keep the project structure compatible with Maven, so people who want to use it internally should be fine; I think that we've done this well and, if anything, we've simplified the Maven build process lately by removing Twirl.

Anyway, as I said, I don't think any solution is perfect here, but I'm curious to hear your input.

Matei

Konstantin Boudnik

unread,

Jul 15, 2013, 9:23:17 PM7/15/13

to d...@spark.incubator.apache.org, spark-de...@googlegroups.com, Jey Kottalam

Hi Matei.

The reason I am using Maven for Bigtop packaging and not SBT is because the
the former's dependency management is clean and let me build a proper assembly
with only relevant dependencies: e.g. no Hadoop if I don't need to, etc.

I don't hold onto the packaging the way it is done in the current maven build,
because of the use of the Shader plugin: I believe flattening project
dependencies is a suboptimal way to go.

I am glad that you're calling to cease the use of classifiers. Big +1 on that!
Using alternative names or versions to reflect dependency differences is
certainly a great idea!

I, perhaps, don't know much about SBT, but I think it is trying to solve Maven
rigidity the way the Gradle did. However, the latter is introducing a
well-defined DSL and integrates with Maven/Ant more transparently than SBT
does.

That said, I would love to stick with more mature build system, that is also
wider accepted in Java community. But if the people involved into the project
want to go with SBT as a build platform - that will work from Bigtop
standpoint of view as far as we'd able to get a sensible set of libraries for
further packaging (a-la https://github.com/mesos/spark/pull/675).

Hope it helps,
Cos

signature.asc

Matei Zaharia

unread,

Jul 16, 2013, 12:28:28 AM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Jey Kottalam

Cos, do you have any experience with Gradle by any chance? Is it something you'd recommend trying? I do agree that SBT's dependency management, being based on Ivy, is not ideal, but I'm not sure how common Gradle is and whether it will really work well with Scala.

Matei

Prashant Sharma

unread,

Jul 16, 2013, 2:58:56 AM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Jey Kottalam

Hello cos,

I have a few questions inline!

On Tue, Jul 16, 2013 at 6:53 AM, Konstantin Boudnik <c...@apache.org> wrote:

Hi Matei.

The reason I am using Maven for Bigtop packaging and not SBT is because the
the former's dependency management is clean and let me build a proper assembly
with only relevant dependencies: e.g. no Hadoop if I don't need to, etc

Isn't this achievable using SBT? I think it should be possible to define tasksets for that. Then we should be able to do something we do with maven(mvn -Pwithout-Hadoop) like sbt package-wo-hadoop etc. I am not a SBT Ninja but I have seen it somewhere it is possible to extend tasks. I guess https://github.com/harrah/xsbt/wiki/Getting-Started-Custom-Settings#extending-but-not-replacing-a-task !

I don't hold onto the packaging the way it is done in the current maven build,
because of the use of the Shader plugin: I believe flattening project
dependencies is a suboptimal way to go.

I am glad that you're calling to cease the use of classifiers. Big +1 on that!
Using alternative names or versions to reflect dependency differences is
certainly a great idea!

I, perhaps, don't know much about SBT, but I think it is trying to solve Maven
rigidity the way the Gradle did. However, the latter is introducing a
well-defined DSL and integrates with Maven/Ant more transparently than SBT
does.

That said, I would love to stick with more mature build system, that is also
wider accepted in Java community. But if the people involved into the project
want to go with SBT as a build platform - that will work from Bigtop
standpoint of view as far as we'd able to get a sensible set of libraries for
further packaging (a-la https://github.com/mesos/spark/pull/675).

Hope it helps,
Cos

Do you have any other concern apart from the dependency management ? IMHO two build systems are difficult to maintain with sophisticated build configurations.

--
Prashant

Shane Huang

unread,

Jul 16, 2013, 5:54:43 AM7/16/13

to d...@spark.incubator.apache.org, spark-de...@googlegroups.com, Konstantin Boudnik, Jey Kottalam

Hi Matai.

I myself would prefer Maven than SBT.

On one hand, from my own experience Maven is better at resolving lib dependencies. I used both SBT and Maven to build spark. When I use SBT I often have to add customized resolvers in build script to make it pass, while I don't have to do that for maven. (Maybe I missed something here... I'm not an SBT expert anyway :( )

On the other hand, it seems the open source community is much more familiar with maven than SBT. If we'd like more people to contribute to Spark, I think using maven is a good idea.

Thanks,

Shane

shengshe...@intel.com

--

Shane Huang

Intel Asia-Pacific R&D Ltd.

Email: shengshe...@intel.com

Mridul Muralidharan

unread,

Jul 16, 2013, 7:04:20 AM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Jey Kottalam, Konstantin Boudnik

Without commenting about relative benefits of either, practically (for me) from spark point of view :

1. Managing different profiles in sbt : different hadoop profiles have incompatible interface definitions : and so spark profile code depends on the specific hadoop ver (mrv1 vs hadoop2 vs yarn +hadoop v2)
Current approach of code change in build file for sbt to build different profiles just plain sucks.

2. The way sbt flattens jars for assembly is very order sensitive (and scala/sbt ver sensitive ?)
We just had better exp with maven : though this is just better from two worse alternatives.
Would prefer if we had a good solution to building consolidated jar which manages dependencies well.

3. Maven is very resource hungry compared to sbt, is much slower and frankly is a pain in the ass - just that too many other folks have gone through same and thankfully documented the same online ! But that does not make it any better.

Regards
Mridul

--
You received this message because you are subscribed to the Google Groups "Spark Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Henry Saputra

unread,

Jul 16, 2013, 4:26:09 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

Hi Matei,

Thanks for bringing up this build system discussion.

Some CI tools like hudson can support multi Maven profiles via different jobs so we could deliver different release artifacts for different Maven profiles.

I believe it should be fine to have Spark-hadoop1 and Spark-haddop2 release modules.

Just curious how actually SBT avoid/resolve this problem? To support for different hadoop versions we need to change in the SparkBuild.scala to make it work.

And as far as maintaining just one build system I am +1 for it. I prefer to use Maven bc it has better dependency management than SBT.

Thanks,

Henry

Matei

Cody Koeninger

unread,

Jul 16, 2013, 4:33:20 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

If you're looking at consolidating build systems, I'd ask to consider ease of cross-publishing for different Scala versions. My instinct is that sbt will be less troublesome in that regard (although as I understand it, the changes to the repl may present a problem).

We're needing to use 2.10 for a project, so I'd be happy to put in some work on the issue.

Matei Zaharia

unread,

Jul 16, 2013, 4:35:37 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

Henry, our hope is to avoid having to create two different Hadoop profiles altogether by using the hadoop-client package and reflection. This is what projects like Parquet (https://github.com/Parquet) are doing. If this works out, you get one artifact that can link to any Hadoop version that includes hadoop-client (which I believe means 1.2 onward).

Matei

Matei Zaharia

unread,

Jul 16, 2013, 4:37:41 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

Unfortunately, we'll probably have to have different branches of Spark for different Scala versions, because there are also other libraries we depend on (e.g. Akka) that have separate versions for Scala 2.10. You can actually find a Scala 2.10 port of Spark in the scala-2.10 branch on GitHub.

Matei

Evan Chan

unread,

Jul 16, 2013, 7:10:01 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Jey Kottalam

On Monday, July 15, 2013 11:58:56 PM UTC-7, Prashant Sharma wrote:

Hello cos,

I have a few questions inline!

On Tue, Jul 16, 2013 at 6:53 AM, Konstantin Boudnik <c...@apache.org> wrote:

Hi Matei.

The reason I am using Maven for Bigtop packaging and not SBT is because the
the former's dependency management is clean and let me build a proper assembly
with only relevant dependencies: e.g. no Hadoop if I don't need to, etc

Isn't this achievable using SBT? I think it should be possible to define tasksets for that. Then we should be able to do something we do with maven(mvn -Pwithout-Hadoop) like sbt package-wo-hadoop etc. I am not a SBT Ninja but I have seen it somewhere it is possible to extend tasks. I guess https://github.com/harrah/xsbt/wiki/Getting-Started-Custom-Settings#extending-but-not-replacing-a-task !

Yes, there are multiple ways of excluding deps such as Hadoop from sbt assembly. It's not hard.

If you don't want to change a build script to do it, we can easily add a dep in SBT build script on say an environment variable to only exclude deps if such a variable is present.

Evan Chan

unread,

Jul 16, 2013, 7:19:55 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

If the APIs for those libraries such as Akka stay the same, you don't need different branches. In SBT you can easily support two different sets of deps depending on which Scala version you are building.... not sure if you could do that with Maven.

Here are the main differences that would require a different branch:

- if you are taking advantage of Akka Cluster, or features exclusive to newer releases

- The Akka futures API changed packages between Scala 2.9.1/2 and Scala 2.9.3/2.10. However, since the project upgraded to Scala 2.9.3, it should migrate all use of futures to the scala.concurrent.* namespace to avoid more code changes down the line.

As far as SBT vs Maven vs Gradle etc.....

- I personally think SBT's "console" REPL with all dependencies in the classpath is super valuable for development, ad hoc testing, trying new ideas out. Not sure if Maven has that.

- SBT also has triggered recompilation/testing, which is again very valuable for some of us

- Gradle has a easier to understand DSL, but it seems once you build a complex enough system, it is no easier to maintain than SBT

- Maven is much better integrated into most IDEs, but SBT is far superior for dev workflow if you don't use one of the major IDEs or are a command line person.

-Evan

Ryan LeCompte

unread,

Jul 16, 2013, 7:37:26 PM7/16/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

+1 for SBT. It's pretty much the de facto standard for mainstream Scala projects.

Matei Zaharia

unread,

Jul 18, 2013, 1:48:34 PM7/18/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

Thanks for the feedback. It looks like there are more advantages to Maven than I was originally thinking of -- specifically, the better dependency resolution and assembly construction. (SBT assembly just takes all the JARs in lib_managed and packages them together unfortunately, which means you sometimes get multiple versions of the same artifact if you aren't very careful with exclusion rules). I think what we'll do is to wait until we see whether we can have a single Spark artifact that works with any Hadoop version, and go back to the build system issue then.

Matei

Evan Chan

unread,

Jul 18, 2013, 1:56:18 PM7/18/13

to spark-de...@googlegroups.com, d...@spark.incubator.apache.org, Konstantin Boudnik, Jey Kottalam

There is also an alternative called 'sbt-onejar" we can look at.

--
You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/OxL268v0-Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

Evan Chan

Staff Engineer
e...@ooyala.com |

Koert Kuipers

unread,

Jul 19, 2013, 4:50:31 PM7/19/13

to spark-de...@googlegroups.com

i always thought the issue with sbt and lib_managed had to do with multiple sub-projects doing their transitive dependency resolution independently and then just putting them all in a single lib_managed. but now i am observing the same issue on a single project (no sub-projects).

we use ivy all the time and never have this issue. ivy picks the best candidate for a conflicting dependency and leaves out the rest.

so i am not sure what is going on here with sbt but it's not right.

Evan Chan

unread,

Jul 19, 2013, 5:33:58 PM7/19/13

to spark-de...@googlegroups.com

Actually the problem is not with SBT, or ivy, but with "sbt-assembly", the plugin used to create fat jars.

By default, sbt assembly will complain and not build a jar if there are two versions of jars. However, the Spark build file has these lines, which instruct assembly to pick the "first" class if there are conflicts.

mergeStrategy in assembly := {

case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard

case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard

case "reference.conf" => MergeStrategy.concat

case _ => MergeStrategy.first

}

If you don't care about assemblies, SBT by default just selects the highest version if there are conflicts, and that is the classpath it generates.

lib_managed has nothing to do with SBT or ivy, it is manually managed, checked-in jars. Most SBT projects (at least with us) don't use lib_managed or lib/ at all, we mostly rely on the .ivy2 cache.

-Evan

On Fri, Jul 19, 2013 at 1:50 PM, Koert Kuipers <ko...@tresata.com> wrote:

i always thought the issue with sbt and lib_managed had to do with multiple sub-projects doing their transitive dependency resolution independently and then just putting them all in a single lib_managed. but now i am observing the same issue on a single project (no sub-projects).

we use ivy all the time and never have this issue. ivy picks the best candidate for a conflicting dependency and leaves out the rest.

so i am not sure what is going on here with sbt but it's not right.

On Thu, Jul 18, 2013 at 1:48 PM, Matei Zaharia <matei....@gmail.com> wrote:

Thanks for the feedback. It looks like there are more advantages to Maven than I was originally thinking of -- specifically, the better dependency resolution and assembly construction. (SBT assembly just takes all the JARs in lib_managed and packages them together unfortunately, which means you sometimes get multiple versions of the same artifact if you aren't very careful with exclusion rules). I think what we'll do is to wait until we see whether we can have a single Spark artifact that works with any Hadoop version, and go back to the build system issue then.

Matei

You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/OxL268v0-Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

Koert Kuipers

unread,

Jul 19, 2013, 5:43:43 PM7/19/13

to spark-de...@googlegroups.com

then i think i misunderstood sbt. i assumed that if i set retrieveManaged := true that sbt would put my transitive dependencies in lib_managed for me to inspect. it in that case that i see multiple versions of jars in lib_managed which i interpreted as meaning that sbt did not do a good job picking the highest version.

--
You received this message because you are subscribed to the Google Groups "Spark Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.

Koert Kuipers

unread,

Jul 19, 2013, 5:51:00 PM7/19/13

to spark-de...@googlegroups.com

still leaves me with a question: how can i get my hands on these transitive dependencies that sbt (or ivy) picked for me? so without the version dupes.

in ant/ivy this is simple. you can go to lib/default and there they are... no dupes

retrieveManaged := true doesnt seem to do that

Evan Chan

unread,

Jul 19, 2013, 5:53:28 PM7/19/13

to spark-de...@googlegroups.com

Hm good question. I know in SBT if you do "show dependency-classpath" (or in build files, maybe you can do:

println("Current classpath: " + dependencyClasspath in Compile)

Koert Kuipers

unread,

Jul 19, 2013, 6:58:43 PM7/19/13

to spark-de...@googlegroups.com

thats interesting. i did "show dependency-classpath" in sbt. it showed a single slf4j-api jar in lib_managed, however if you look inside lib_managed there are 2 version of it.

so why is lib_managed populated with these dupes, yet sbt doesnt use them? i dont see the point of it.

Koert Kuipers

unread,

Jul 19, 2013, 7:02:18 PM7/19/13

to spark-de...@googlegroups.com

and if i comment out the "retrieveManaged := true" then it indeed shows me a single slf4j-api in my ivy cache. 

so there is nothing wrong with ivy's dependency management and sbt's usage of it.

but the way lib_managed is populated with "retrieveManaged := true" makes no sense

Evan Chan

unread,

Jul 20, 2013, 2:52:20 AM7/20/13

to spark-de...@googlegroups.com

Maybe what we need is to create an SBT task that just copies the jars from the dependency classpath to the lib_managed (or other) folder.

It wouldn't be that hard. Actually you could probably just run a shell script that invokes "sbt 'show dependency-classpath'" and munges the output, then loops over CP and does it.

-Evan

Koert Kuipers

unread,

Jul 20, 2013, 10:15:51 AM7/20/13

to spark-de...@googlegroups.com

that seems sufficient for a single project. i am still unsure as to what happens with sub-projects. if sbt resolves the sub-projects independently and never does a dependency conflict resolution at the level of the top project then we might still end up with dupes. i plan to do some tests for that next

Evan chan

unread,

Jul 20, 2013, 11:02:43 AM7/20/13

to spark-de...@googlegroups.com, spark-de...@googlegroups.com

If the top project aggregates or depends on the other sub projects then the class path for the top project should include all sub-dependencies as well.

-Evan

To be free is not merely to cast off one's chains, but to live in a way that respects & enhances the freedom of others. (#NelsonMandela)

Koert Kuipers

unread,

Jul 20, 2013, 11:55:00 AM7/20/13

to spark-de...@googlegroups.com

i dont think it works. the classpath for the top project simply seems to be the sum of the classpaths for the individual projects.

this means if ivy picked slf4j-api-1.6.1.jar for one subproject, and slf4j-api-1.6.6.jar for another sub-project, which is entirely feasible since ivy runs independently for each subproject (or so it seems), then the top project has both in its classpath

Koert Kuipers

unread,

Jul 20, 2013, 12:10:55 PM7/20/13

to spark-de...@googlegroups.com

so the summarize i think there are 2 issues:

1. when a project has transitive dependencies that include multiple versions of a jar, sbt uses ivy to pick one (by default the latest) and only puts that one on the classpath. so this behavior is correct and desired. however when using retrieveManaged := true, it somehow drops all (conflicting) versions of the jar in lib_mananged.

2. when a project has sub-projects ivy is used for dependency resolution on a sub-project basis, but not for the top project. the classpath of the top-project is simply all the classpaths of the subprojects combined. this means a top-project can have multiple versions of a jar on its classpath and in lib_mananged.

dmha...@gmail.com

unread,

Aug 6, 2013, 11:01:30 AM8/6/13

to spark-de...@googlegroups.com

On Saturday, July 20, 2013 12:10:55 PM UTC-4, Koert Kuipers wrote:

so the summarize i think there are 2 issues:
1. when a project has transitive dependencies that include multiple versions of a jar, sbt uses ivy to pick one (by default the latest) and only puts that one on the classpath. so this behavior is correct and desired. however when using retrieveManaged := true, it somehow drops all (conflicting) versions of the jar in lib_mananged.

I think there is a misunderstanding of the purpose of retrieveManaged. It is just an artifact cache local to the build to insulate it build from other projects on a machine. You can clean the normal cache and having this local cache means you can continue to develop without needing an `update`. It isn't a directory that you can just throw on a classpath, for example.

2. when a project has sub-projects ivy is used for dependency resolution on a sub-project basis, but not for the top project. the classpath of the top-project is simply all the classpaths of the subprojects combined. this means a top-project can have multiple versions of a jar on its classpath and in lib_mananged.

This isn't the case, at least not by default. sbt shouldn't just concatenate managed classpaths. Perhaps there is some customization that does it or a plugin incorrectly does it.

-Mark

Matei Zaharia

unread,

Aug 8, 2013, 4:27:57 PM8/8/13

to spark-de...@googlegroups.com

Thanks for the input, Mark. I guess one of the big issues we have is how you'd recommend packaging an application to run it locally or to ship to other machines with SBT. Initially, we used lib_managed as a way to get all the library JARs in one place, and we created a classpath from those plus the application JARs (built with sbt package). However, that has this problem with multiple versions. We have also tried using sbt assembly, but as far as I've seen (and I could be wrong), sbt assembly can still put in multiple versions of the same artifact. Is this the recommended way? We can go back to trying it. I also found it worrying that assembly is not a core function of SBT, but rather a plugin, and that it required jumping through hoops for Akka config files.

Matei

Mark Harrah

unread,

Aug 8, 2013, 4:36:27 PM8/8/13

to spark-de...@googlegroups.com

On Thu, 8 Aug 2013 13:27:57 -0700
Matei Zaharia <matei....@gmail.com> wrote:

> Thanks for the input, Mark. I guess one of the big issues we have is how you'd recommend packaging an application to run it locally or to ship to other machines with SBT. Initially, we used lib_managed as a way to get all the library JARs in one place, and we created a classpath from those plus the application JARs (built with sbt package). However, that has this problem with multiple versions. We have also tried using sbt assembly, but as far as I've seen (and I could be wrong), sbt assembly can still put in multiple versions of the same artifact. Is this the recommended way? We can go back to trying it. I also found it worrying that assembly is not a core function of SBT, but rather a plugin, and that it required jumping through hoops for Akka config files.

It would be a bug in something- assembly, sbt, the build configuration, ... if there are multiple managed jars on the classpath. (If jars are dumped in lib/, there is not much sbt can do about that.)

If comparing to Maven, assembly is a plugin in Maven as well. If not, what is the concern about it not being core?

Can you point to what you have to do in sbt vs. Maven for Akka config files? I'm not familiar with the problem.

-Mark

> >> --
> >> You received this message because you are subscribed to the Google Groups "Spark Developers" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >>
> >> --
> >> You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
> >> To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/OxL268v0-Qs/unsubscribe.
> >> To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Spark Developers" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Spark Developers" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>

Matei Zaharia

unread,

Aug 9, 2013, 11:52:22 PM8/9/13

to spark-de...@googlegroups.com

Here's what we've had to do for Akka: https://github.com/mesos/spark/blob/master/project/SparkBuild.scala#L272. Basically, Akka has this config file called "reference.conf" that it expects to hold the "default" values of each property, but different Akka JARs have different parts of it, and you're supposed to append them. This is probably more of an issue with Akka though since we've also had to do this in Maven.

Anyway, I'll look into the multiple versions of an artifact in assembly. I might be wrong about that because some of the problems we had in the past were with particular artifacts that changed artifact ID without changing package name (!), like Netty, so the build system rightfully couldn't figure them out.

Matei

Evan chan

unread,

Aug 10, 2013, 2:12:52 AM8/10/13

to spark-de...@googlegroups.com, spark-de...@googlegroups.com

Matei,

According to the Akka / typesafe config developers, one is supposed to append all of the reference.confs together. So that's just how it works.

-Evan
To be free is not merely to cast off one's chains, but to live in a way that respects & enhances the freedom of others. (#NelsonMandela)

Reply all

Reply to author

Forward