Dependencies refactoring

25 views
Skip to first unread message

Ondřej Čertík

unread,
May 21, 2015, 7:40:29 PM5/21/15
to hash...@googlegroups.com
Hi,

I talked about this with Francois Bissey as well as with Chris Kees
some time ago, and I would like to discuss it publicly here.
Currently, all dependencies must be unique, as they are build using
the same profile.

Use case 1: I would like to have two packages A and B in the same
profile, where A only builds with older swig, and B only builds with
newer swig. It's only a build time dependency, so why not to allow it?

Use case 2: library A is linked against older Lapack, library B is
linked against newer lapack. Program C is linked against A and B, but
doesn't explicitly depend on Lapack. These are all build time
dependencies. The problem is, that we should not allow this kind of
linking, as the symbols will be mixed up with the two versions of
Lapack.

Use case 3: You have the same A and B libraries, however this time the
package C has a binary c1 that is linked with A, and binary c2 that is
linked with B. Then there is no problem.


It needs to be the package maintainer who decides what the situation
is. One solution is to have these kind of dependencies in Hashdist:

* build-nonlink (do not need to be unique)

* build-link (must be recursively unique for the given package, but do
not need to be unique for the profile)

* run (must be unique for the whole profile)

The profile is only constructed (i.e. linked) with the packages
specified in run, recursively. The build-link and build-nonlink are
only used during the build and they may be linked using rpath (i.e.
also used at runtime), but they will *not* be explicitly linked in the
profile.

The build-nonlink is for the use case 1, it would be the swig package.
The build-link is for the use case 2, it would be lapack for A, B, as
well as the packages A, B for C (and so Hashdist would complain if you
try to build the package C). Finally, in use case 3, you would also
use build-link for packages A, B, but build-nonlink for package C.


But perhaps there are other solutions, so I wanted to ask for opinions here.

Francois, how would the use cases 1, 2 and 3 be handled in Gentoo?

Ondrej

François Bissey

unread,
May 21, 2015, 10:55:37 PM5/21/15
to hash...@googlegroups.com
On 05/22/15 11:40, Ondřej Čertík wrote:
> Francois, how would the use cases 1, 2 and 3 be handled in Gentoo?

Since you ask specifically.

Case 1:
Two possible roads:
* patching (A) to build with newer swig, unless undesirable for
other reasons.
* Use slots. You can have two versions of the same package installed
at the same time. One usually will be default and the other one may have
an altered name. In extreme case you may have a mechanism to switch
between the two.

For swig in particular:
dev-lang/swig
Available versions:
(1) 1.3.40-r2^t
(0) 2.0.9^t ~2.0.12^t ~3.0.2^t ~3.0.4^t 3.0.5^t

Any version 2 to 3 is in slot 0 and you can only have one version
from that slot installed at a time. You can install swig 1.3.40 at
the same time than 2.0.9 or 3.0.5 for example. swing 1.3.40 gets a "1.3"
suffix.

Case 2:
There is only one slot for most lapack you can install, so we don't
have the exact situation. But old version has to be qualified, in
linux parlance, if both version have the same "soname" then they have
the same API and the first one in your runpath, ld.so.conf cache will
be the one used - without incident. If they don't have the same soname
they are different libraries as far as the linker is concerned. I am
uncertain what behavior you get for duplicate symbols when (C) runs.

While we cannot an old and new version of lapack, we can have lapack
from ATLAS or lapack-reference compiled against one of the blas
implementations available or a proprietary one (MKL). You can get
in case 2 that way and Gentoo does nothing to stop you currently.
Because all those lapacks have different soname see previous paragraph.

Things could get more interesting if (C) links against lapack. There
is one active lapack to compile against at any time. You have an utility
to switch. (C) could be linked against a third lapack...

Here we let user shoot themselves in the foot. In practice very few
non-dev types have more than one lapack. While I am avoiding
inconsistencies I have had only one experience where things would go
funny in that case. Would have to look for it in old log-entries and emails.

Case 3:
No issue there.

Francois Bissey

unread,
May 22, 2015, 1:20:31 AM5/22/15
to hash...@googlegroups.com
Now that I am over the school rush I should add an addendum to
case 2.

1) I was supposing dynamic libraries. static linking is also fun and
different but I assumed that it was avoided.

2) It kinds of ties in with static libraries. It is perfectly legal for
shared objects to be underlinked. Only executable have to be
fully resolved. A case in point is libgsl (and not libels as the
spellchecker is suggesting) out of the box. gsl needs a cblas
but isn’t linked to one by default. All cblas symbols are unresolved
and no soname for cblas is recorded. It is expected that you provide
a cblas when compiling your application.
Distro frown on underlinking out of the box so they will tie you to a
cblas.
Pros:
You only care about cblas when you link your application so you care
about it only once. The version of lapack A and B have been compiled
against are irrelevant, the only relevant one is the one you need to provide
when you compile C.
Cons:
You need to know you need an extra library. A is linked against libB
and libB uses libC, even if A doesn’t use libC directly you need to
link against it explicitly.

This is exactly what you would deal with if you had static libraries.
You probably want to use this approach only with standardised
libraries such as {c}blast and lapack.

François
> --
> You received this message because you are subscribed to the Google Groups "hashdist" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+u...@googlegroups.com.
> To post to this group, send email to hash...@googlegroups.com.
> Visit this group at http://groups.google.com/group/hashdist.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hashdist/555E9AA5.4040901%40canterbury.ac.nz.
> For more options, visit https://groups.google.com/d/optout.

Dag Sverre Seljebotn

unread,
May 22, 2015, 4:04:17 AM5/22/15
to hash...@googlegroups.com
Hi Ondrej,

this hits pretty close to much of what me and Mark talked about too. For
every case the question is whether we have constraint solving or not. I
don't expect to be able to do constraint solving this spring, but I'll
discuss both cases.

On 05/22/2015 01:40 AM, Ondřej Čertík wrote:
> Hi,
>
> I talked about this with Francois Bissey as well as with Chris Kees
> some time ago, and I would like to discuss it publicly here.
> Currently, all dependencies must be unique, as they are build using
> the same profile.
>
> Use case 1: I would like to have two packages A and B in the same
> profile, where A only builds with older swig, and B only builds with
> newer swig. It's only a build time dependency, so why not to allow it?

Once my current parameter refactor is done this should be easy to add by
adding some syntax to use a package spec twice.

WITHOUT CONSTRAINT SOLVING:

packages:
swig as swig_for_A:
version: 1
swig:
version: 2
A:
swig: swig_for_A # use an older swig here
B:
# gets swig version 2 by default

WITH CONSTRAINT SOLVING:

If the A package has

constraints:
- swig.version <= 1.3

and package B has

constraints:
- swig.version >= 1.5.2

then we could have the constraint solver decide to instantiate two
copies of swig. The question is whether this is wanted behavior in all
cases; we may well need some sort of annotation to declare that this is
safe (like the ones you suggest). But until we have constraint solving
you would need to handle this manually in the profile anyway.

> Use case 2: library A is linked against older Lapack, library B is
> linked against newer lapack. Program C is linked against A and B, but
> doesn't explicitly depend on Lapack. These are all build time
> dependencies. The problem is, that we should not allow this kind of
> linking, as the symbols will be mixed up with the two versions of
> Lapack.


Well, unless you do something special the lapack will be the same for A
and B, right? But assume again we had this:

packages:
lapack as lapack_old:
version: 1
lapack:
version: 2
A:
lapack: lapack_old
B: # deps [lapack]
C: # deps [A, B]

In this case, what we should do is add a constraint to C.yaml:

constraints:
- A.lapack == B.lapack # forces these to be same build artifact

So now the profile above will give an error, because the user tried to
do something that is explicitly forbidden in C.yaml. Remove the C
package, and the profile builds fine with different lapacks for A and B.

This is fundamentally what must happen; the question is whether we want
that constraint to be auto-generated through some syntax candy or not.

On this topic I sort of want to wait until this kind of constraint is
commonly in use, then look at what annotation is natural to deduce such
a constraint for the dependencies declaration.

> Use case 3: You have the same A and B libraries, however this time the
> package C has a binary c1 that is linked with A, and binary c2 that is
> linked with B. Then there is no problem.

So this is the example above but you just drop the constraints section.


> It needs to be the package maintainer who decides what the situation
> is. One solution is to have these kind of dependencies in Hashdist:
>
> * build-nonlink (do not need to be unique)
>
> * build-link (must be recursively unique for the given package, but do
> not need to be unique for the profile)
>
> * run (must be unique for the whole profile)

I do like these categories. But my proposal is we get the
parameter-and-constraints system up and running well first, use the
constraints like I describe above when we need them, then view what you
state here as syntax candy to auto-generate/inhibit constraints that we
can put on top. That should increase the odds of making the right
categories.

Dag Sverre

Dag Sverre Seljebotn

unread,
May 22, 2015, 4:10:04 AM5/22/15
to hash...@googlegroups.com
BTW, sorry if this is hijacking the thread, but it's a nice time to
solicit syntax proposals for the copy-package syntax.

swig as swig_for_A:
# like this best except if I scan through, the name swig_for_A is
# not first

swig_for_A copy of swig:
# name is first, but don't like "copy of"

swig_for_A(swig):
# not sure if I like it, it looks like subclassing and it's not really
# that, it's more like copying..

swig_for_A = swig:
# this one I kinda like to be honest

swig_for_A < swig:
# now nobody will guess that it does

swig_for_A:
copy: swig
#or
spec: swig
# most boring one, perhaps makes something that's out-of-the-ordinary
# too easy to miss?


Dag Sverre

Francois Bissey

unread,
May 22, 2015, 4:17:20 AM5/22/15
to hash...@googlegroups.com

> On 22/05/2015, at 20:10, Dag Sverre Seljebotn <d.s.se...@astro.uio.no> wrote:
>
> swig_for_A:
> copy: swig
> #or
> spec: swig
> # most boring one, perhaps makes something that's out-of-the-ordinary
> # too easy to miss?

I like boring. Boring is absolutely fine, no it is a requirement,
for that kind of stuff.

Opinionated François

Dag Sverre Seljebotn

unread,
May 22, 2015, 4:21:15 AM5/22/15
to hash...@googlegroups.com
I think I described it wrong. Reason I don't like it is it's magic. In
general the dict nested within the package name is parameters to the
package, this would be we are making a magic parameter that means
something completely different.

The completely boring option would be

packages:
- name: swig_for_A
spec: swig # could default to name, or name could default to spec
parameters:
<...>

but that's breaking backwards compatability a lot, and we already did go
the route of brevity and extra-YAML-syntax in other places.

Dag Sverre

Chris Kees

unread,
May 22, 2015, 7:24:40 AM5/22/15
to hash...@googlegroups.com
Great! I was hoping the new  parameter stuff would support case  1 (at least) and am glad Ondrej brought this up.

On the syntax, if we're going in the direction of brevity, I also like

swig as swig_for_A:

The reason I like it is that swig is really a sort of equivalence class of packages, and this syntax makes it clear that swig_for_A is a representation of the  swig equivalence class, which we're going to use for some purpose.

Chris

--
You received this message because you are subscribed to the Google Groups "hashdist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hashdist+u...@googlegroups.com.
To post to this group, send email to hash...@googlegroups.com.
Visit this group at http://groups.google.com/group/hashdist.
Reply all
Reply to author
Forward
0 new messages