Dependency scopes - is there a better way?

38 views
Skip to first unread message

James Livingston

unread,
Jun 19, 2013, 3:01:17 AM6/19/13
to adep...@googlegroups.com
Hi,

I've just gotten around to watching the NEScala session video about this, and was quite interested, so am coming to this later than many other people. Is there anything documented about any design decisions made, other than the notes on https://github.com/sbt/adept/wiki/NEScala-Proposal, or is it all still up for debate?


One thing I saw was briefly discussed the other month was dependencies and scopes got mentioned. Having used Maven for quite a while, I think it's compile/test/provided scopes are a fairly bad idea, since it conflates several orthogonal things, and I don't think Adept should copy it there. In particular it mixes:
* Packaging decisions - provided scope doesn't get packages into WARs, EARs, etc, but that makes no sense for plain jars
* Build phase usage.
* Build requirements versus runtime requirements


I believe that the dependency manager should only be concerned with the last of those, and the first two should be the responsibility of the build tool.

Packaging decisions are all about the target environment and not the artifacts themselves. Consider a library which uses JMS to send messages - whether that is "provided" or not is not a property of the library itself, but whatever is using it. If the framework is used inside a WAR deployed to a Java EE container it should not be packaged, but if it's inside a WAR deployed to Tomcat or a standalone Java app it should be.

Build phases are obviously not related to a dependency management tool, but the build tool specifying when it needs certain dependencies.


What I think the dependency management tool should deal with is two things: do anything using this need the given artifact to build, and does anything using this need the given artifact to run?  The build time requirement would be for when you use the given artifacts classes in your API, and the run time requirement may be a hard or optional one.

So maybe for each dependency we have two properties, although I'm not sure exactly what to call them. Something like:

  dependencies [
    {name: "org.slf4j/slf4j-api", version: ">= 1.6.6", runtime="required", buildtime="no"}
    {name: "scala-api", version: "2.10.*", runtime="required", buildtime="required"}
    {name: "example.better.performance", version: "1", runtime="optional", buildtime="no"}
  ]


The build tool is then free to decide which phases of a build this is used in, including that test-compilation and test-running may be different. It can also skip artifacts during packaging if appropriate. The build tool may ask the dependency tool for information when doing it, but it wouldn't be the dependency manager's job.

--
James "Doc" Livingston

Josh Suereth

unread,
Jun 19, 2013, 9:17:11 AM6/19/13
to adept-dev
James -

Not sure if you've seen how Ivy works, but by default "configurations" are just meaningless buckets to the dependency manager.  It's the build system itself that provides meaning.   sbt (and most ivy users) just happen to take the same conventions as maven to make users from those tools more familiar.  sbt itself is using the flexibility of configurations to do more advanced dependency management than is available in maven, such as attaching which compiler plugins are used to the module definition.

I agree with you that it's the BUILD TOOL's responsibility to figure out what to do with artifacts, and it's the DEP MGR's responsibility to resolve the correct artifacts and help figure out conflict resolution.

I *believe* we were thinking of configurations similar to Ivy, where they are just big buckets.  There's also the option to not support them, because they add some significant crazy to resolution.

SO, in any case, that avoids the issue you bring up:

* THe deployer of an package/module specifies the artifacts
* The package manager resolves package/module artifacts and transitive dependencies
* The build system needs to know how to pull in transitive dependencies and use them appropriately

I think the "open bucket" approach of Ivy works quite well here for flexibility, with a set of default conventions.  However, the onus is on the build tool, and artifact deployer to agree on terms, and the user is stuck with annoying "excludes" if they disagree on the terms.    Do you have a solution that helps ease the pain of where to place the information, and how to do most work by default?

Mark Harrah

unread,
Jun 19, 2013, 9:33:14 AM6/19/13
to adep...@googlegroups.com
On Wed, 19 Jun 2013 09:17:11 -0400
Josh Suereth <joshua....@gmail.com> wrote:

> James -
>
> Not sure if you've seen how Ivy works, but by default "configurations" are
> just meaningless buckets to the dependency manager. It's the build system
> itself that provides meaning. sbt (and most ivy users) just happen to
> take the same conventions as maven to make users from those tools more
> familiar. sbt itself is using the flexibility of configurations to do more
> advanced dependency management than is available in maven, such as
> attaching which compiler plugins are used to the module definition.
>
> I agree with you that it's the BUILD TOOL's responsibility to figure out
> what to do with artifacts, and it's the DEP MGR's responsibility to resolve
> the correct artifacts and help figure out conflict resolution.
>
> I *believe* we were thinking of configurations similar to Ivy, where they
> are just big buckets. There's also the option to not support them, because
> they add some significant crazy to resolution.

Explain?

Mark Harrah

unread,
Jun 19, 2013, 9:39:43 AM6/19/13
to adep...@googlegroups.com
On Wed, 19 Jun 2013 00:01:17 -0700 (PDT)
James Livingston <li...@sunsetutopia.com> wrote:

> Hi,
>
> I've just gotten around to watching the NEScala session video about this,
> and was quite interested, so am coming to this later than many other
> people. Is there anything documented about any design decisions made, other
> than the notes on https://github.com/sbt/adept/wiki/NEScala-Proposal, or is
> it all still up for debate?

Fredrik has a prototype that includes the basic ideas, but debate is certainly still welcome. One open issue is the versioning/resolution scheme, for example. Josh has addressed the current thinking on scopes/configurations in his reply, so we can continue that discussion there.

-Mark

Josh Suereth

unread,
Jun 19, 2013, 10:01:51 AM6/19/13
to adept-dev
On Wed, Jun 19, 2013 at 9:33 AM, Mark Harrah <dmha...@gmail.com> wrote:
On Wed, 19 Jun 2013 09:17:11 -0400
Josh Suereth <joshua....@gmail.com> wrote:

> James -
>
> Not sure if you've seen how Ivy works, but by default "configurations" are
> just meaningless buckets to the dependency manager.  It's the build system
> itself that provides meaning.   sbt (and most ivy users) just happen to
> take the same conventions as maven to make users from those tools more
> familiar.  sbt itself is using the flexibility of configurations to do more
> advanced dependency management than is available in maven, such as
> attaching which compiler plugins are used to the module definition.
>
> I agree with you that it's the BUILD TOOL's responsibility to figure out
> what to do with artifacts, and it's the DEP MGR's responsibility to resolve
> the correct artifacts and help figure out conflict resolution.
>
> I *believe* we were thinking of configurations similar to Ivy, where they
> are just big buckets.  There's also the option to not support them, because
> they add some significant crazy to resolution.

Explain?


Basically, the cons of the approach are:

* You have to declare all dependencies for all configurations up-front for modules
* You wind up with duplicated artifacts across configuration buckets (build tool would have to do set differencing)
* The deployer and build tool must agree on the convention.   It's entirely possible to have useful artifacts with useless metadata, so tools are not able to use them if one developer decides on a new standard.

In general though, I think the flexibility from configuration buckets and the simplicity of the concept is worth it.  The benefits I see from using Ivy's configurations are very high.

However, I was  curious if there was a counter-proposal that serves the same need in such a simple way and avoids the cons.


Mark Harrah

unread,
Jun 19, 2013, 10:20:06 AM6/19/13
to adep...@googlegroups.com
On Wed, 19 Jun 2013 10:01:51 -0400
Can you give an example? I'm not sure what you mean.

> * You wind up with duplicated artifacts across configuration buckets (build
> tool would have to do set differencing)

I don't understand. What is duplicated and why is classpath arithmetic necessary? sbt doesn't need to do this now and it would be wrong if it did.

> * The deployer and build tool must agree on the convention. It's entirely
> possible to have useful artifacts with useless metadata, so tools are not
> able to use them if one developer decides on a new standard.

A convention is important, but the metadata certainly doesn't become useless when stepping out of the convention. There is indeed a small increased burden on the consumer, who has to write "compile->custom" instead of just "compile".

These aren't "significant crazy" or really even much related to the resolution algorithm.

-Mark

Josh Suereth

unread,
Jun 19, 2013, 10:27:05 AM6/19/13
to adept-dev
True, it isn't significant crazy.   It's a limited set of issues that usually aren't problems.
  
The duplication I'm thinking of is if you have two dependencies:  C and B, where C is in your "compile" bucket and B is in your "test" bucket.  Both C and B have transitive dependencies on A.  So now, my "test' bucket and "compile" bucket both have A.jar on it.

it's probably no big deal in most cases.   However, tools need to be aware that artifact duplication in buckets could occur.

Mark Harrah

unread,
Jun 19, 2013, 12:31:09 PM6/19/13
to adep...@googlegroups.com
On Wed, 19 Jun 2013 10:27:05 -0400
Josh Suereth <joshua....@gmail.com> wrote:

> True, it isn't significant crazy. It's a limited set of issues that
> usually aren't problems.
>
> The duplication I'm thinking of is if you have two dependencies: C and B,
> where C is in your "compile" bucket and B is in your "test" bucket. Both C
> and B have transitive dependencies on A. So now, my "test' bucket and
> "compile" bucket both have A.jar on it.
>
> it's probably no big deal in most cases. However, tools need to be aware
> that artifact duplication in buckets could occur.

Tools should never do further processing on the set of resolved dependencies. It is the job of the dependency manager to do conflict resolution and another tool modifying the output invalidates that. sbt 0.7 and earlier modified the managed classpath and it was fundamentally broken as a result.

In particular, a tool should never think about combining configurations. It should set up a configuration according to what it needs it for and then uses just that configuration for that role. If it needs to combine compile, provided, optional, test, and runtime for a test classpath, it should declare the 'test' configuration to extend all of those and let the dependency manager figure it out.

-Mark

Alex Boisvert

unread,
Jun 19, 2013, 1:09:09 PM6/19/13
to adep...@googlegroups.com
On Wed, Jun 19, 2013 at 9:31 AM, Mark Harrah <dmha...@gmail.com> wrote:
In particular, a tool should never think about combining configurations.  It should set up a configuration according to what it needs it for and then uses just that configuration for that role.  If it needs to combine compile, provided, optional, test, and runtime for a test classpath, it should declare the 'test' configuration to extend all of those and let the dependency manager figure it out.

Agreed, with the caveat that few existing tools follow this principle today ;)  But I certainly see it as the way forward.

This also brings up the requirement that (development-time) configurations may not be only pointing to .jars and other packaged artifacts but also directories containing .class files and/or other resources.  If the tool can't combine configurations, it needs ways to dynamically create configurations and attach classpath entries to it.

To illustrate this, assume the following project:

project P

  "resource" configuration
  provides static resources in src/main/resources

  "compile" configuration depends on:
     * com.example:artifact1:jar:1.0
     * org.example:artifact2:jar:2.1
  and provides compiled classes in target/classes

  "test" configuration depends on:
     * "compile" configuration (of project P)
     * org.testing:test-framework:jar:0.1
  and provides compiled classes in target/test/classes

  "jar" configuration depends on:
     * com.example:artifact1:jar:1.0
     * org.example:artifact2:jar:2.1
  and provides an artifact named "org.something:project_p:jar:SNAPSHOT"
  available locally under target/project_p-SNAPSHOT.jar

(I'm assuming the "resource", "compile" and "test" configurations are not publicly exported, but that the "jar" configuration is exported, as in published in remote repos)

I guess where I'm going with this is first asking whether it's the right way to look at things -- basically that configurations both depend on and provide stuff.    It seems necessary that configurations provide stuff (e.g., classpath entries, be them packaged artifacts, working directories, ...) if the build tool can't compute dependencies by itself.  Right?

James Livingston

unread,
Jun 19, 2013, 5:54:44 PM6/19/13
to adep...@googlegroups.com
On Wednesday, June 19, 2013 11:17:11 PM UTC+10, Josh Suereth wrote:
Not sure if you've seen how Ivy works, but by default "configurations" are just meaningless buckets to the dependency manager.

I've used Ivy, but not as deeply as maven.

 
I *believe* we were thinking of configurations similar to Ivy, where they are just big buckets.  There's also the option to not support them, because they add some significant crazy to resolution.

They may make resolution more crazy, but you also lose a reasonable amount of useful functionality by not having the ability to specify certain information about how the dependency is used.

It's fairly common to have your artifact requiring another to build and at runtime, but your users don't need it to build. Take logging frameworks as an example, where the Spring framework requires the Log4J API to build and run. An EE application which uses Spring does not need Log4J to compile, but the resulting WAR/EAR needs log4j packaged (unless the target environment provides it).


If we don't have some way to support encoding this kind of thing into the dependency metadata, then I think we're choosing to drop some fairly common use cases. Having extra artifacts in your compile classpath usually wouldn't be a fatal problem though. The reverse situation where you need something at build time but not runtime also happens.

 
SO, in any case, that avoids the issue you bring up:

* THe deployer of an package/module specifies the artifacts
* The package manager resolves package/module artifacts and transitive dependencies
* The build system needs to know how to pull in transitive dependencies and use them appropriately

Sort of, the build system either needs to know what situation transitive dependencies are needed, or just use them everywhere and hope the result is what should happen. In a "test" build phase there are multiple steps, you compile the test code and then you run the tests, but whether certain transitive dependencies are needed isn't necessarily the same for both.
 
 
I think the "open bucket" approach of Ivy works quite well here for flexibility, with a set of default conventions.  However, the onus is on the build tool, and artifact deployer to agree on terms, and the user is stuck with annoying "excludes" if they disagree on the terms.    Do you have a solution that helps ease the pain of where to place the information, and how to do most work by default?
 
The open bucket approach can work, but I'm not sure if all the "scope" things need to be in the dependency metadata. The fact library X needs junit to runs it's tests does not need to be in the repository's dependency metadata, it is purely a build concern of X. I can't think of any reason another artifact would care what X used to run it's tests with, so would argue that information does not belong in the exported dependency metadata. Maven has the entire POM with information that is only relevant at build time in the repo, which I think is wrong.


I've been trying to think about what dependency information needs to be available to dependant artefacts, and the only ones I could come up with was "is it required to build dependent artefacts" and "is it required (or optional) to execute code in this artifact". Can anyone think of use cases that doesn't cover or can think of other things that dependent artifacts need to know, so should be in the exported metadata?

I'm not saying there aren't more, I just can't think of them.

-- 
James Livingston

Josh Suereth

unread,
Jun 19, 2013, 6:05:19 PM6/19/13
to adept-dev
On Wed, Jun 19, 2013 at 5:54 PM, James Livingston <li...@sunsetutopia.com> wrote:
On Wednesday, June 19, 2013 11:17:11 PM UTC+10, Josh Suereth wrote:
Not sure if you've seen how Ivy works, but by default "configurations" are just meaningless buckets to the dependency manager.

I've used Ivy, but not as deeply as maven.

 
I *believe* we were thinking of configurations similar to Ivy, where they are just big buckets.  There's also the option to not support them, because they add some significant crazy to resolution.

They may make resolution more crazy, but you also lose a reasonable amount of useful functionality by not having the ability to specify certain information about how the dependency is used.

It's fairly common to have your artifact requiring another to build and at runtime, but your users don't need it to build. Take logging frameworks as an example, where the Spring framework requires the Log4J API to build and run. An EE application which uses Spring does not need Log4J to compile, but the resulting WAR/EAR needs log4j packaged (unless the target environment provides it).


If we don't have some way to support encoding this kind of thing into the dependency metadata, then I think we're choosing to drop some fairly common use cases. Having extra artifacts in your compile classpath usually wouldn't be a fatal problem though. The reverse situation where you need something at build time but not runtime also happens.

 
SO, in any case, that avoids the issue you bring up:

* THe deployer of an package/module specifies the artifacts
* The package manager resolves package/module artifacts and transitive dependencies
* The build system needs to know how to pull in transitive dependencies and use them appropriately

Sort of, the build system either needs to know what situation transitive dependencies are needed, or just use them everywhere and hope the result is what should happen. In a "test" build phase there are multiple steps, you compile the test code and then you run the tests, but whether certain transitive dependencies are needed isn't necessarily the same for both.
 
 
I think the "open bucket" approach of Ivy works quite well here for flexibility, with a set of default conventions.  However, the onus is on the build tool, and artifact deployer to agree on terms, and the user is stuck with annoying "excludes" if they disagree on the terms.    Do you have a solution that helps ease the pain of where to place the information, and how to do most work by default?
 
The open bucket approach can work, but I'm not sure if all the "scope" things need to be in the dependency metadata. The fact library X needs junit to runs it's tests does not need to be in the repository's dependency metadata, it is purely a build concern of X. I can't think of any reason another artifact would care what X used to run it's tests with, so would argue that information does not belong in the exported dependency metadata. Maven has the entire POM with information that is only relevant at build time in the repo, which I think is wrong.



Perhaps given the current approach to testing.  However, one feature I'd love to add to sbt (once we can get major artifacts *off* of maven repositories) is the ability to run a projects tests via a different project.  I.e. I want to resolve its test" configuration bucket and run those tests on some remote vm.   That way, I can run my unit tests on the cross set of Windows 7, Windows 8, Windows Server 20XX, Ubuntu, Fedora, Mint, MacOSX and Java 6, Java 7 and Java 8-Preview.   IDEALLY, I can build my artifacts on ONE machine and test them on this cross section, thereby saving time.   The only way I can do this is if I can encode those dependencies in the dependency system, the way ivy does now.

The only reason I haven't actively pursued this approach is the prevalence of Maven repositories, preventing this data from propagating and allowing "remote tests" to work.   However, for Typesafe activator, I retain hope that we can accomplish this, which is why we're still deploying it Ivy-style for now.


So yes, I can half-understand your point.   IMHO, all dependencies used by the build should be encodable into the dependency manager.  Whether or not you use them is irrelevant to whether or not that can be useful.




 

I've been trying to think about what dependency information needs to be available to dependant artefacts, and the only ones I could come up with was "is it required to build dependent artefacts" and "is it required (or optional) to execute code in this artifact". Can anyone think of use cases that doesn't cover or can think of other things that dependent artifacts need to know, so should be in the exported metadata?

I'm not saying there aren't more, I just can't think of them.


Right, there's a few more things to encode:

* Do we need these on the system lib path for native loading?
* Is your documentation available for this version?
* Is there source code for this version?
* What scalac compiler plugins are needed to compile code with your library (Note: We don't really denote this now)
* I'd like to select the shared-library variant of your code (.so) vs. static libraries (.a)


Again, let's think a bit outside the box.   It's hard to anticipate needs, but I still feel that Ivy's configuration system is, hands down, the best mechanism to represent a lot of "artifact grouping" concerns.   It makes no real committments on the dependency resolution side, and allows build tools/ecosystems to define their conventions.

However, now is the time to think through other possibilities, and examine their fit.

Mark Harrah

unread,
Jun 19, 2013, 6:42:07 PM6/19/13
to adep...@googlegroups.com
I haven't thought about having resources as a separate configuration before. It might be useful, but perhaps I should explain what I meant originally a bit more.

Yes, configurations both declare dependencies and provide artifacts. Published artifacts of a module are members of one or more configurations. Dependencies are associated with one or more configurations. When a consumer depends on a configuration of a module, it gets both the module's dependencies for that configuration as well as the published artifacts in that configuration.

When building the module itself, a tool usually wants to get both "dependencies only" as well as "dependencies+artifacts" for a configuration. When constructing the classpath for compilation, for example, the artifacts aren't available yet since they will be produced by compilation.

When I said that tools should not manipulate dependency lists, I meant the dependencies and not necessarily the artifacts. If a tool swaps in a class/resource directory that is otherwise equivalent to the jar, there is no problem there (at least as far as the dependency manager is concerned). The key is knowing that they are equivalent of course, but a tool building a project will know what will go in that artifact. However, if the tool says "I'm going to replace a-1.0.jar with a-2.0.jar", it is now doing conflict resolution.

-Mark

James Livingston

unread,
Jun 20, 2013, 1:56:53 AM6/20/13
to adep...@googlegroups.com
On Thursday, June 20, 2013 8:05:19 AM UTC+10, Josh Suereth wrote:
Perhaps given the current approach to testing.  However, one feature I'd love to add to sbt (once we can get major artifacts *off* of maven repositories) is the ability to run a projects tests via a different project.  I.e. I want to resolve its test" configuration bucket and run those tests on some remote vm.   That way, I can run my unit tests on the cross set of Windows 7, Windows 8, Windows Server 20XX, Ubuntu, Fedora, Mint, MacOSX and Java 6, Java 7 and Java 8-Preview.

Another use for that (which would require publishing test artifacts into the repo) would be running the unit tests for dependant artifacts against what you just built, which would be quite cool. If I know my changes to libx have broken liby in the past, I could say "please run liby's unit tests, except with the libx I just built" to verify I haven't broken it this time.

 
 So yes, I can half-understand your point.   IMHO, all dependencies used by the build should be encodable into the dependency manager.  Whether or not you use them is irrelevant to whether or not that can be useful.
 
Having a generalised mechanism to encode all that information is great, but as you said there will have to be conventions about what the information is. I think that some things are important enough (like the basic version information), that the conventions need to be thought out as the tool is being developed, since they will affect things like conflict resolution and transitive dependency handling.
 
 
* Do we need these on the system lib path for native loading?
* Is your documentation available for this version?
* Is there source code for this version?
* What scalac compiler plugins are needed to compile code with your library (Note: We don't really denote this now)
* I'd like to select the shared-library variant of your code (.so) vs. static libraries (.a)

 Those are all very good things to have, and there will be plenty more - I've got a few more ideas which I'll post about another day too.

I tried to keep this thread about a specific thing, the problem that maven's "provided" scope tries to solve, albeit poorly. Often dependencies should only be transitive some of the time, such as you needing Log4j to run a Spring app but not needed to compile a Spring app (unless you happen to use it too). My point was that I think that an artifact's repo metadata should be able to declare when it is transitive (build and run time are the two different situations I thought of), and that I think build phase usage and packaging are separate concerns so should be mashed together like Maven's <scope> does.

--
James

Evan Chan

unread,
Jun 21, 2013, 4:12:29 AM6/21/13
to adep...@googlegroups.com


Again, let's think a bit outside the box.   It's hard to anticipate needs, but I still feel that Ivy's configuration system is, hands down, the best mechanism to represent a lot of "artifact grouping" concerns.   It makes no real committments on the dependency resolution side, and allows build tools/ecosystems to define their conventions.

However, now is the time to think through other possibilities, and examine their fit.

 Ivy's config system as used by SBT and its plugins today feels way complicated, and I feel like there is or should be a much simpler solution actually...  so I'd like to take you up on the challenge.  Allow me to present a real use case.

Group A:
    project source files ->   target/classes

Group B:
    com.my-company.libA
    org.super-scala-project.blah

Group C:
    org.apache.hadoop  :  hadoop-core  :  0.20.2
    org.apache.hive        :  hive-core         ....

Group D:
    org.apache.cassandra : ....        intransitive()

Group E:
    org.scalatest......

Needed for compile:  A + B + C + D
Needed for test:   A + B + C + D + E
Needed for console and run:  A + B + C + D
fat-jar-package 1 :  A + B only
fat-jar package 2:   A + B  + D (but not D's transitive deps)

So in theory, I can assign different scopes to these, but the defaults don't really work.   Creating a fat jar with "sbt assembly" requires that the proper packages be put into the "run" scope, but if I take group C and D out of the "Run" scope, then I won't be able to do "sbt console" or "sbt run".  I had to spent multiple hours browsing through docs, in forums, just to figure out a half working solution to this.

My point is that
 - plugin authors tend to stick to predefined scopes and not give users a easy way to customize
 - If your requirements fit existing scopes that's fine, but if your requirements don't fit then it's way too hard

Why can't I tell the build tool something like this, Make style?   (Actually the SBuild project does exactly this)

Task("compile").dependsOn(A + B + C + D)
Task("assembly").dependsOn(A + B)

Having sensible defaults is a good idea, but so is transparency and simplicity.   You may argue that for most users, the above Task(...) lines should not be necessary; on the other hand, having such a line makes it extremely easy and obvious for users to change should they need to... and eventually they will. 

Sorry if this is a slight tangent.    My point is that  I don't think scopes really makes things easier for users, as it leads to too much convention, which leads to inflexibility.
I think having a simple, easy way to compose different dependencies for different end tasks is what is really needed.
Reply all
Reply to author
Forward
0 new messages