Hi Matei.
The reason I am using Maven for Bigtop packaging and not SBT is because the
the former's dependency management is clean and let me build a proper assembly
with only relevant dependencies: e.g. no Hadoop if I don't need to, etc
I don't hold onto the packaging the way it is done in the current maven build,
because of the use of the Shader plugin: I believe flattening project
dependencies is a suboptimal way to go.
I am glad that you're calling to cease the use of classifiers. Big +1 on that!
Using alternative names or versions to reflect dependency differences is
certainly a great idea!
I, perhaps, don't know much about SBT, but I think it is trying to solve Maven
rigidity the way the Gradle did. However, the latter is introducing a
well-defined DSL and integrates with Maven/Ant more transparently than SBT
does.
That said, I would love to stick with more mature build system, that is also
wider accepted in Java community. But if the people involved into the project
want to go with SBT as a build platform - that will work from Bigtop
standpoint of view as far as we'd able to get a sensible set of libraries for
further packaging (a-la https://github.com/mesos/spark/pull/675).
Hope it helps,
 Cos
Without commenting about relative benefits of either, practically (for me) from spark point of view :
1. Managing different profiles in sbt : different hadoop profiles have incompatible interface definitions : and so spark profile code depends on the specific hadoop ver (mrv1 vs hadoop2 vs yarn +hadoop v2)
Current approach of code change in build file for sbt to build different profiles just plain sucks.
2. The way sbt flattens jars for assembly is very order sensitive (and scala/sbt ver sensitive ?)
We just had better exp with maven : though this is just better from two worse alternatives.
Would prefer if we had a good solution to building consolidated jar which manages dependencies well.
3. Maven is very resource hungry compared to sbt, is much slower and frankly is a pain in the ass - just that too many other folks have gone through same and thankfully documented the same online ! But that does not make it any better.
Regards
Mridul
--
You received this message because you are subscribed to the Google Groups "Spark Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Matei
Hello cos,I have a few questions inline!On Tue, Jul 16, 2013 at 6:53 AM, Konstantin Boudnik <c...@apache.org> wrote:
Hi Matei.
The reason I am using Maven for Bigtop packaging and not SBT is because the
the former's dependency management is clean and let me build a proper assembly
with only relevant dependencies: e.g. no Hadoop if I don't need to, etcIsn't this achievable using SBT? I think it should be possible to define tasksets for that. Then we should be able to do something we do with maven(mvn -Pwithout-Hadoop) like sbt package-wo-hadoop etc. I am not a SBT Ninja but I have seen it somewhere it is possible to extend tasks. I guess https://github.com/harrah/xsbt/wiki/Getting-Started-Custom-Settings#extending-but-not-replacing-a-task !
--
You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/OxL268v0-Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.
i always thought the issue with sbt and lib_managed had to do with multiple sub-projects doing their transitive dependency resolution independently and then just putting them all in a single lib_managed. but now i am observing the same issue on a single project (no sub-projects).
we use ivy all the time and never have this issue. ivy picks the best candidate for a conflicting dependency and leaves out the rest.
so i am not sure what is going on here with sbt but it's not right.
On Thu, Jul 18, 2013 at 1:48 PM, Matei Zaharia <matei....@gmail.com> wrote:
Thanks for the feedback. It looks like there are more advantages to Maven than I was originally thinking of -- specifically, the better dependency resolution and assembly construction. (SBT assembly just takes all the JARs in lib_managed and packages them together unfortunately, which means you sometimes get multiple versions of the same artifact if you aren't very careful with exclusion rules). I think what we'll do is to wait until we see whether we can have a single Spark artifact that works with any Hadoop version, and go back to the build system issue then.Matei
You received this message because you are subscribed to a topic in the Google Groups "Spark Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-developers/OxL268v0-Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-develope...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Spark Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-develope...@googlegroups.com.
so the summarize i think there are 2 issues:1. when a project has transitive dependencies that include multiple versions of a jar, sbt uses ivy to pick one (by default the latest) and only puts that one on the classpath. so this behavior is correct and desired. however when using retrieveManaged := true, it somehow drops all (conflicting) versions of the jar in lib_mananged.
2. when a project has sub-projects ivy is used for dependency resolution on a sub-project basis, but not for the top project. the classpath of the top-project is simply all the classpaths of the subprojects combined. this means a top-project can have multiple versions of a jar on its classpath and in lib_mananged.