Great spec! It was really clear and makes it easy to understand Adept plans. Or I guess you can decide based on this email what I understand ;-)
- sugar for constrainedBy("version" -> "1.2.1") seems warranted since this has to be the most common constraint. Autocomplete doesn't work on strings, only types and methods, so "version" as a string isn't as discoverable.
I know Fredrik supports the idea of an artifacts.txt and splitting resolution (and I agree as well). There was some discussion on this earlier and the open questions are around the ideas you've mentioned. My opinion is that there is no fundamental obstacle to properly caching resolution automatically because all of the metadata is local and the artifacts are cached by hash. Therefore, you don't gain much of a speed advantage from caching except for the first time. I think you want to do this uptodate checking and reresolve automatically if there aren't problems. You should be able to configure it to fail if the resolved artifacts are different. I think this has been referred to as "locking" resolution. I think we agree at least that is desirable.
I think in practice it would be integrated into the build tool, but it would be properly decoupled compared to the way things are now and you'd be able to lock down resolution.
Hi,
Great spec! It was really clear and makes it easy to understand Adept plans. Or I guess you can decide based on this email what I understand ;-)
Here are some things I thought about reading it, fwiw
1.
The proposed sbt syntax seems a little too hard for the common case. I understand you're trying to illustrate the general mechanism, agreed that should exist, but could the common case be kind of like:
dependencyRepositories := Seq("git://blahblah/foo/3.0")
dependencies := Seq("mygroup/mylib", "othergroup/otherlib" constrainedByVersion "1.2.1")
- in the long term (if adept becomes the default way of doing things) will having the word "adept" in there seem odd?
- sugar for constrainedBy("version" -> "1.2.1") seems warranted since this has to be the most common constraint. Autocomplete doesn't work on strings, only types and methods, so "version" as a string isn't as discoverable.
- could do things like, in sbt automatically constrain by the scalaVersion that's set for the project
I know this seems sort of like a surface/refinement issue but I think it'd be worth looking at early on, to be sure the implementation supports the simplest surface.
2.
I'm not sure if this is already the plan or not based on the document, but to me *resolution* should be done explicitly only by people hacking on the module, and the results *checked into git*.
That is, split "sbt update" into two separate things: "resolve" which determines the artifacts to use; and "download-artifacts" which sucks them onto the local machine. *Most builds* only need to (only *should*) "download-artifacts".
If I just download some random project source from github and type "sbt compile" or even "publish", to me that should just yank down a bunch of artifacts identified by sha hash, and that's it. No constraint-solving.
Resolution can ONLY introduce bugs. If I'm a hacker on a module, and I resolve on my workstation, then I want to upload those resolution results, keep them in git, have Jenkins reproduce them *exactly*, and anyone hacking on the module who types "publish" should be using the exact same artifacts... if the results of resolving change, then I want it to be visible - it's probably some kind of problem! I want to be able to reproduce builds later, see changes to resolution in pull requests, watch when things changed in my git history.
If there are new versions available, then I should type "resolve" (or whatever) and it will go see if the resolution results have changed. If they have, then my local git-managed file listing artifacts will be modified, and then I check it in. This means that upgrading to the latest version is *visible* rather than silent.
As a nice side effect, this means SPEED - not waiting on the constraint solver ;-) All we have to do is 1 stat per entry in the artifacts list to see if that file exists already in cache.
The resolution file in git would be the full transitive resolution, I think (resolution for a module is global for that module and all its deps). So if module A depends on module B depends on module C, then A may not end up using the same C that B was published against. But that's fine as long as all constraints are met. What would be true is that the hashes used for both B and C would be checked in to the A git repo.
If you had an "artifacts.txt" kind of file with the artifacts list, it could include human-readable comments just for convenience:
0c857914ca893ce09378fd4ffa42aa13363ea466 # com.typesafe.play/play 2.1.1
8ae9a903ce90a6be0fa3a7dbfcbd02dca97357b0 # org.junit/junit 1.4
That makes git diff more useful.
If I have any funky local changes that might affect resolution (proxies, global sbt config, local metadata server, whatever) then it would show up when I try to push my PR. Or even if Adept itself changes its resolution algorithm and two people use different adept versions, then that would show up.
Another thing this permits is that build tools only have to understand the already-resolved artifacts file potentially - resolution could be a separate command line tool if desired, not part of any build tool...
Havoc
On Fri, 20 Sep 2013 16:33:48 -0400
eugene yokota <eed3...@gmail.com> wrote:
> On Fri, Sep 20, 2013 at 4:02 PM, Havoc Pennington <h...@pobox.com> wrote:
>
> > Great spec! It was really clear and makes it easy to understand Adept
> > plans. Or I guess you can decide based on this email what I understand ;-)
> >
>
> ditto.
>
>
> > - sugar for constrainedBy("version" -> "1.2.1") seems warranted since
> > this has to be the most common constraint. Autocomplete doesn't work on
> > strings, only types and methods, so "version" as a string isn't as
> > discoverable.
> >
>
> If Adept is saying it's not going to auto-evict like Ivy does, should the
> version really the be the default constraint?
> I think "binary-version" should be mandated as metadata or at least be the
> default.
Right. My thoughts here are that this is at a higher level than the core. It is probably a set of conventions followed by a build system for a particular domain.
For example, sbt might translate its normal syntax a % b % c to something like:
group=a, name=b, sourceVersion=majorMinorOnly(c), binaryVersion=majorMinorOnly(c)
Here, sourceVersion probably isn't gaining anything over binaryVersion, but I'm thinking it might in practice. The main point is that there is a version the user specifies that gets translated into the right constraints. This is only the default and the user can take more control if desired. Something similar might happen for version := Y when publishing.
> Related to that, version algebra should be defined so one can reliably
> compare one version to the other. (Ivy spec something like use php to
> compare[1])
I agree that if it is necessary to compare versions, this should be specified. I personally haven't seen use cases where this is necessary in the core resolution engine, although auxiliary tools, like "find me the most recent version of X", might. I'm not sure yet. Fredrik has done a good job collecting use cases and describing how adept handles them. Something we'd like to see more of are more use cases, such as ones that require versions to be compared.
On Fri, Sep 20, 2013 at 5:47 PM, Mark Harrah <dmha...@gmail.com> wrote:I know Fredrik supports the idea of an artifacts.txt and splitting resolution (and I agree as well). There was some discussion on this earlier and the open questions are around the ideas you've mentioned. My opinion is that there is no fundamental obstacle to properly caching resolution automatically because all of the metadata is local and the artifacts are cached by hash. Therefore, you don't gain much of a speed advantage from caching except for the first time. I think you want to do this uptodate checking and reresolve automatically if there aren't problems. You should be able to configure it to fail if the resolved artifacts are different. I think this has been referred to as "locking" resolution. I think we agree at least that is desirable.
Cool. Yeah, if resolution is instant then it's harmless to do automatically. I guess the main point for me is that anytime an artifact changes in my entire stack, I'd like to manually approve and record it (my version control system being the natural way to do so). This gives 100% reproducible builds and avoids weird mystery situations. Exactly how it works is sort of up to whoever is coding this thing and working out the details.
The beauty of the model is really how simple it is and how easy it is to implement.I have tried to prove this and hacked together a (naive) implementation of the resolution engine the way I see it now:The actual algorithm is only ~ 30 lines of code.
I have also a test dsl so that it easy to create small and easy-to-read test cases. I have made some example unit tests in the link that follows which further demonstrates how resolution works: https://github.com/adept-dm/adept/blob/master/src/test/scala/adept/core/resolution/ConstraintsTest.scala
I think it is important to have a testing framework where it is easy to test small specific chunks of functionality - if you have any input on the way the DSL looks it is most welcome.Also the test cases are a good place to start if you want to understand the model better.You are also most welcome to add test cases and to try to break it or use the test dsl to find ways where we cannot express a use case using this model! :)
If we can solve all use cases and prove that this implementation works, I think it is just a matter of adding the tooling around it and (as we are going further) and make the implementation faster and safer (if you look at the impl you see why I mention this - currently it is optimised for readability only :).
On Sep 22, 2013 8:45 PM, "Josh Suereth" <joshua....@gmail.com> wrote:
>
>
>
>
> On Sat, Sep 21, 2013 at 6:44 AM, Fredrik Ekholdt <fre...@gmail.com> wrote:
>>
>> The beauty of the model is really how simple it is and how easy it is to implement.
>>
>> I have tried to prove this and hacked together a (naive) implementation of the resolution engine the way I see it now:
>> https://github.com/adept-dm/adept/blob/master/src/main/scala/adept/core/resolution/Resolver.scala
>> The actual algorithm is only ~ 30 lines of code.
>>
>
> Wow, that is quite small.
Yep, I think it might grow a bit, but if the core resolution algorithm is succinct it will make things that much easier (of course)
>
>>
>> I have also a test dsl so that it easy to create small and easy-to-read test cases. I have made some example unit tests in the link that follows which further demonstrates how resolution works: https://github.com/adept-dm/adept/blob/master/src/test/scala/adept/core/resolution/ConstraintsTest.scala
>>
>
>>
>> I think it is important to have a testing framework where it is easy to test small specific chunks of functionality - if you have any input on the way the DSL looks it is most welcome.
>> Also the test cases are a good place to start if you want to understand the model better.
>> You are also most welcome to add test cases and to try to break it or use the test dsl to find ways where we cannot express a use case using this model! :)
>>
>
> Looks like a good start so far! I couldn't tell from the two locations you list, but what kind of information is reported upon resolution failure? I'd say trying to get a robust error message on "tricksy" failure would be a good next step. A resolution engine that works in the happy case is great. A resolution engine that is *informative* on the failure case is pretty much a promised land of goodness and unicorns. What do you think an elegant way would be to test error messages or error information?
Yeah I agree that it is very (* 10) important with good error messages. What you can get after resolution now is:
- The graph containing module ids and their children
- The unresolved ids
- The resolved ids
- The constraints it found for each id
- The variants for each id it had when resolution ended.
If you have more than one variant per id it is under constrained, 0 variants for an id means it is over constrained.
With this info you can basically create a nice graph and show which constraints it found. As we move along we should save where we found the constraints as well so that can be part of the graph.
Even with what we have now though it is possible to prompt the user to be more specific if it is under constrained (more than one variant for an id). I have the version order algo that ivy and maven and php uses so this could be used to suggest a good likely version if a version is the issue.
If it is over constrained you can currently print out the constraints and the id and prompt the user to loosen up the constraints. If there is a conflict: 2 constraints that want different things (e.g. 2 different versions of the same id) you can also detect this. In the case where it is over constrained on a dependency that the user defined, it would be enough to ask the user to loosen the constraint. If it is on a module you did not define you have to override that variant - overrides are not available yet though. Telling the user how to fix the issue could also be part of the error message. As we implement search we could also do a fuzzy query and get a did-you-mean in this case. I think that would very cool.
Currently you can also print the (partial) graph it found in case it is over/under constrained. This is nice to be able to debug. I know I hate it when ivy fails deep down on a transitive dep but you cannot see why that dep is there before you fix it.
For error messages we could create some, but the design is simple enough to be able to communicate the issues so that the build tool creates them themselves. At least if we get the API right. The ideal error message for me is one that shows you the problem very clearly *and* also gives you a tip on how to fix it. As we are moving forward we could also have command that fixes it for you or at least suggest exactly what you have to do to fix it. It could be up to the build tool to choose what is the best approach: automatically fix or explain the issue.
>
>
>>
>> If we can solve all use cases and prove that this implementation works, I think it is just a matter of adding the tooling around it and (as we are going further) and make the implementation faster and safer (if you look at the impl you see why I mention this - currently it is optimised for readability only :).
>>
>>
> Yeah, it's amazing to see how fast this project is moving. Great work Fred!!
Thx:) it is really just because the design is so simple. Makes me think we might be on to something ;)
I have started on overrides and exclusions helpers now, but I will be busy on other things the first part of this week. If I am lucky I *hope* to finish those by the end of this week still. With those in place and validated I think importing data from maven or ivy would be the next step. I think it is important not only for the functionality but to validate that we can solve the same use cases.
I will also update the spec with some ideas on the artifacts files that we discussed earlier if nobody else feels like they want to do it first...
Nice to have a specification. :-)
Here are some thoughts.
==Artifacts==
From Definitions: "An artifact hash is linked to a set of location providers,
where the artifact can be found"
Will it be possible to download artiftacts from their original source? This
is, for me, an important goal to reach and is essentially what I hear when you
tell separated metadata.
Will artifacts inside other resources/artifacts be possible, e.g. a JAR in a
ZIP on the tools home page? This is of course no major use case, but some
dependencies are sometimes only available inside a ZIP file from the original
tool/library provider (e.g. JUnit, but I didn't check since in the last 12
month, though).
==Attributes==
When defining an attribute (in the global config), how to deal with modules,
that do not set this attribute. Will these be excluded from the tree or
included by default?
Also, I think, it is essential to have some predefined common-sense attributes,
to avoid a cluttered inhomogenous attribute landscape, where nobody knows
which attribute means what.
==Hashes==
As discussed earlier in that ML, I argue, that hashes are not that readable by
human and do not provided a natural order like e.g. incremented revision
numbers. When thinking about package versions, you have to deal with three
kind of metadata evolutions:
1. new version of the package
2. new version of the metadata (e.g. because of newly known incompatibilities)
3. refactorings of the metadata itself (e.g. because of new features of adept)
which do not affect the effective artifacts for the end user
In my view, the third category should not bump the artifact revision/hash as
there are no effects for the package consumer. But this is not possible if the
hashes are driven by the underlying storage mechanism (currently git). If we
would use a more maintainer-friendly "hash", e.g. r0, r1 than this would be
possible and the package consumer could easily grasp, which revision (aka
hash) is newer. Of course, some rules have to be applied, in which cases it is
allowed to not-bump a revision.
Also, ordered revision nr. could be included into versions ranges, whereas
unordered hashed do not support to be included in version ranges (at least
not, without having some additional knowledge).
But these are just some thought based on the use of Jackage and almost ten
years of Gentoo portage, which both have such a revision mechanism. Besides
inconvenience, no show stopper for me.
==Pre-Resolving aka having an "artifacts.txt"==
I believe, a non transient explicit classpath is what most mature projects
need and want. Whereass in new, quick-and-dirty, test, name-it-what-you-want
projects you want a fast start and automatic transitive dependency resolution
is desireable.
I very much like the idea of resolving a dependency graph based on metadata
and committing the result to the project repo. Nobody else (who wants to build
the project) should be required to re-resolve them. But, everybody should be
able to re-resolve to the same result, which is the key for reproducable
setups.
A typical workflow, utilizing a package manager in a "passive way", could be
like this:
- Dev adds some deps to project
- Dev asks package manager/build system to suggest some additional/missing
transitive deps
- Dev picks the one he wants and makes them persistent
- Later the dev modifies, adds, removes, bumps some deps
- Package manager/build system re-analyzes the classpath and detects
missing/conflicting packages and makes suggestions
Therefore, in SBuild, we do not depend on a package manager or on the concept
of automatic managed dependencies at all. But of course, we support it.
So, the SBuild-Adept intergration could be just some analyzing, verification
and suggestion step. No automatism in the build chain, but lots of help in
assembling the build chain. Any hard decision between dependencies is
supervised by the developer. And, I would like to not have a "artifacts.txt"
file but integrate it into SBuild's DSL, but that should be an implementation
detail, IMHO.
==Dependency resolver, conflict resolution, stable packages==
Just some pointers here to avoid NIH syndrom.
In OSGi land, there is the OSGi bundle repository specification RFC and some
implementations, most notably Apache Felix OBR
(http://felix.apache.org/site/apache-felix-osgi-bundle-repository.html), which
support dependency resolution based on a very generic capabilities model. This
fells like almost the same (mightiness) as the constraint-based approach of
Adpet Mark II.
Also the package manager of Gentoo Linux called portage has conceptually very
much in common with the aims of adept. Separate metadata, keywords, use-
flags...
And speaking about portage, it brings the concept of stable vs. unstable
packages, so you can declare a package as unstable as long as you are testing
it. After some time without any change and negative feedback, you can make it
stable. If somebody used a unstable package he knows that it might blow and
idealy knows how to report issues.
Sorry, for the longish post.
R("A")("v" -> "1.0")( //I want A v 1.0
X("B")(), //it depends on some variant of B (do not care which)
X("C")(), //and it depends on some variant of C (do not care which), etc etc
X("D")(),
X("E")()),
V("E")("v" -> "1.0")( //there is only one version of E v 1.0
X("D")("v" -> "1.0")), //and it requires D 1.0
//there are 2 versions of C:
V("C")("v" -> "2.0")(),
V("C")("v" -> "3.0")(), //since we want D 1.0, we and it depends on C 3.0, we must use C 3.0
V("D")("v" -> "2.0")(
X("C")("v" -> "2.0")),
V("D")("v" -> "1.0")(
X("C")("v" -> "3.0")), //<-- depends on C 3.0
//there are also 2 version of B
V("B")("v" -> "1.0")(
X("C")("v" -> "2.0"),
X("F")()),
V("B")("v" -> "2.0")(
X("C")("v" -> "3.0"), //but we must use 2.0 because of our requirement on C 3.0
X("F")()),
//2 variants of F again
V("F")("v" -> "1.0")(
X("C")("v" -> "2.0")),
V("F")("v" -> "2.0")(
X("C")("v" -> "3.0")) //same thing as for B
You get this:
- A [v=(1.0)]
- B [v=(2.0)]
- C <defined>
- F [v=(2.0)]
- C <defined>
- C [v=(3.0)]
- D [v=(1.0)]
- C <defined>
- E [v=(1.0)]
- D <defined>
Am Dienstag, 24. September 2013, 21:13:50 schrieb Fredrik Ekholdt:
| On Sep 24, 2013, at 1:02 PM, Tobias Roeser wrote:
| > Nice to have a specification. :-)
| >
| > Here are some thoughts.
|
| Cool - thanks! :) Lots of great comments I see! See inline for more
|
| > ==Artifacts==
| >
| > From Definitions: "An artifact hash is linked to a set of location
| > providers, where the artifact can be found"
| >
| > Will it be possible to download artiftacts from their original source?
| > This is, for me, an important goal to reach and is essentially what I
| > hear when you tell separated metadata.
|
| I am not very specific about what location providers are, but my current
| thinking that it is just a URI. It can be any type of file as well. We
| can add our own protocols if we need something more complicated as we go.
| We could also have some properties (host(s)) that is used, so you would be
| able to switch out the hosts easily. Just ideas for now though.
That sounds good. The properties idea is good and makes especially sense for
mirrored resources like Maven repos, Eclipse, Sourceforge, etc. In Portage,
they have such mechanism e.g. for Sourceforge, so instead of
http://sourceforge.net/files/a/b/c you just just use http://sourceforge/a/b/c.
Also, I had the feeling that the metadata format of Adept is rather
hard to read compared to an ebuild (Portage's metadata).
But please take this
with a grain of salt, it's a very personal feeling! What I wanna point out is:
Their solutions might be good resources of knowlegde. and, as an example,
getting error and conflict reporting right can be a long process, which we
could cut by looking beyond.
| I wanted to say that being part of OSGI is not necessarily a bad thing, but
| I think when it to a dependency manager this is a great liability. Even if
| you can use independently (though it seems to be quite integrated when you
| look at their use of Manifests, etc ), I think it is a problem for any
| project that doesn't want or need to be part of OSGI even if it not
| strictly a technical issue.
|
|
| I guess ORB is the closest "competitor" to Adept and if it is a 100%
| overlap on features and capabilities I guess Adept would not be needed.
Keep in mind, that an OBR is primarily operating on a versioned package level
(Java packages, those one can import) plus versioned bundles plus transitivity
in terms of uses constraints. These calculations might already be more than
what is needed to setup a build tool. E.g. a compiler only supports a flat
classpath, but the OSGi runtime provides real modules and isolation, as each
bundles has it's own classpath. So the implementation as such might be no good
fit for our use case.
| That being said, it is alarming that this problem is so present, but
| nobody seems to be using ORB (in 2013) except eclipse though it has been
| under development for years (http://www.youtube.com/watch?v=hemY-6dfPnw).
| Are the too tied to OSGI? Did they have a poor migration model? For all I
| know it might be a community issue that holds them back? They seem to have
| a strict specification process. Did they solve a problem ahead of their
| time?
| I feel I sound very biased here (and I think I am:) just so I have said it.
|
| Would be interesting to hear others opinions? Either way it definitely
| deserves more attention.
Sorry, I did not followed your links, but maybe, you got a wrong impression.
Of course, the RFC never made it to final which means in OSGi land, you can not
claim your implementation as final and stable. But besides that, there are a
lot of tools using OBRs, e.g. Apache Karaf or Eclipse bndtools. AFAIK the
official Eclipse herds go another direction with p2, which is a complicated
beast (from overhearing) but besides that yet another dependency manager. ;-)