Brainstorming about Sage dependencies from system packages

325 views
Skip to first unread message

Erik Bray

unread,
May 26, 2017, 9:01:36 AM5/26/17
to sage-devel, sage-pa...@googlegroups.com
Hi folks interested in Sage packaging,

Almost every time the topic comes up, I complain that it isn't easier
to use more system packages as both build- and run-time dependencies
of Sage. I'd like to make some progress on actually doing something
about that, and I have some ideas, but I'd like to bounce them off
anyone who's interested first before just going off and doing it.

There is enough work involved in this that I believe it can and should
be broken up into a number of smaller tasks. I would also like to
approach this in a way that works well and integrates with the
existing "sage-the-distribution" infrastructure. I believe there are
advantages to being able to develop on Sage in the "normal" way we're
already used to, while also being able to take advantage of existing
system packages wherever possible.

So I'm just going to try to organize my existing thoughts on this and
see what anyone thinks. Sorry if it's TL;DR, but I'm hoping that
having a detailed discussion about this will make it more likely that
something will actually be accomplished on it soon (because I think
the actual implementation, once decided on, is not terribly
difficult).

Note: In this message I'm using "package" loosely to refer to any
program, library, database, or other collection of files that is
distributed and installed as a self-contained unit. It doesn't
necessarily relate to any particular "packaging system".


1. Why?
=======

The extent and scope to which Sage "vendors" its dependencies, in the
form of what some call "sage-the-distribution", is *not* particularly
normal in the open source world. Vendoring *some* dependencies is not
unusual, but Sage does nearly all (even down the gcc, in certain
cases). I've learned a lot of the history to this over the past year,
and agree that most of the time this has been done with good reasons.

For example, I can't think of any other software that forces me to
build its own copy of ncurses just to build/install it. This was
added for good reasons [1], but not reasons that can't also resolved
in part by installing the appropriate system packages, or that might
not be resolved by now in system packages that depend on ncurses (i.e.
that should be built with ncurses support). Point being, this issue
does not necessarily impact everyone, and building Sage's own ncurses
is overkill in that case. It would be one thing if we were just
talking one or two packages (I didn't pick on ncurses for any deep
reason), but now multiply that by around 250 (give or take, depending
on how many dependencies are even available as system packages) and it
becomes real overhead to getting started *and* making progress with
Sage development.

I wouln't propose *removing* any existing spkgs that are still
relevant. I think it's really useful that Sage has a list of
known-good pinned versions of its dependencies. Further,
"sage-the-distribution" makes it very easy to install those
dependencies in such a way that they can be used as build/runtime
dependencies by Sage without having to hunt the 'net for the right
source packages of the right versions of those dependencies, and
figure out how to configure and build them in a piecemeal fashion. In
other words, even if we do expand the ability to use system packages
for Sage's dependencies, it's still very nice that it's easy with a
few commands to use the spkg if something goes wrong with the system
package. It's also, of course, important for power users who wish to
compile some dependencies on their own--especially highly tuned
numerical libraries (but even those users usually only care about
being able to hand-configure a few dependencies, not most).

To summarize: being able to more aggressively rely on system packages
can save a lot of time and frustration during normal development of
Sage, and is also less jarring especially to new developers, of whom
we would like to attract more. It should also decrease the time
required to regularly build binary distributions of Sage (e.g. for
Docker, Windows, and Linux distros).


2. Overview of how Sage manages dependencies now (and what won't change)
========================================================================

For many of you this will be unnecessary review, but I want to discuss
a little about how dependencies are currently checked and installed in
Sage-the-distribution. Doing so is helpful for me too, to make sure I
understand it clearly (and correct me if I have any
misunderstandings).

Sage-the-distribution uses *Make* itself (cleverly, IMO) to manage
dependencies insofar as making sure all dependencies are installed,
and that when a package changes all packages that depend (directly or
indirectly) on that package are rebuilt. Make works on files and
timestamps, which does not translate directly to entire software
packages, so to track whether or not an spkg is up to date, Sage uses
the common "stamp pattern" for Make [2]--that is, when an spkg is
installed it writes a file that effectively "represents" completion of
the installation of that spkg for Make's purposes. These stamp files
are the files typically stored under
$SAGE_LOCAL/var/lib/sage/installed/<spkg>-<version>. This directory
is also known in some places as SAGE_SPKG_INST. By including the
version number in the name we can also force rebuilds when an spkg's
version changes.

When one runs `make <spkg>` with just the spkg name, this is actually
a phony target with the path to the stamp file for that package (at
its current version) as the sole target. So `make <spkg>` translates
to `make $SAGE_SPKG_INST/<spkg>-<version>` for the current version of
that spkg. The associated rule is to run the sage-spkg command for
that package, which also takes care of writing the stamp file.
sage-spkg also writes some information into each stamp file in a
somewhat loose format that I don't believe is parsed anywhere.
However the *existence* of these files is used by the (somewhat
controversial, for downstream packagers) `is_package_installed()`
function.* I'm actually going to propose later that we write and use
these stamp files (with some slight changes) even when installing
dependencies from a system package, so these files might be present
even in binary packages for Sage (though that might be up to
downstream packagers).

When Sage's `./configure` script generates the main Makefile for all
of Sage's dependencies, it loops over all the spkgs in build/pkgs/ and
creates two make targets for each spkg: the aforementioned phony
target consisting of just the package name, and the *real* target for
the stamp file. It also creates a make variable named like
`$(inst_<spkg>)` (where <spkg> is just the package name, without the
version) referring to the full path of the stamp file for that
package. Each spkg may list its build dependencies in its
build/pkgs/<spkg>/dependencies file, in the format that it will appear
in the Makefile as dependencies for the make target of that package.
For convenience's sake, the `dependencies` file just contains the
package names, but the `./configure` script converts this to the
appropriate `$(inst_<spkg>)` variables, so that the stamp files become
the real dependencies (part of how the "stamp pattern" normally
works).

When a package is upgraded (i.e. its version number changes) then the
Makefile is regenerated, but with the `$(inst_<spkg>)` for that
package pointing to a new stamp file, containing the new version
number. Thus any dependents of that package will see this as an
outdated dependency, and get rebuilt after the upgraded package is
built. When packages are rebuilt (even if their version didn't
change) their stamp files are touched, forcing further rebuilds of any
of their dependents and so on, in normal Make behavior.

As far as I can tell this has worked quite well for Sage--especially
as it also allows leveraging Make's parallel build features. So I'm
proposing to keep this all pretty much as-is, with possibly only minor
tweaks in the details. Instead, many more of the changes will be at
configure time.


* There is proposed work already mostly done to replace use of
is_package_installed() within the Sage library with a way to do
runtime feature checks: https://trac.sagemath.org/ticket/20382 Some
of this work *might* be redundant with what I want to propose, but can
also coexist with it, as it is currently designed for runtime use by
the Python code itself, and not during builds.


3. Case study--examples already in Sage
=======================================

Sage-the-distribution already has a few examples of "spkgs" in the
system that *may* use a system package, rather than building from
source. As it is this is done in an ad-hoc manner that can be
surprising and/or misleading. But I think it's useful to look at them
to see how this is done currently and if there's anything we can learn
from it.

a) Blas
-------

There are two different BLAS implementation packages to choose from
currently in Sage: OpenBLAS and ATLAS.* The selection can be made
currently at configure time with a --with-blas= flag which can take
either 'openblas' or 'atlas'. The selection is used to write a
variable called `$(BLAS)` in the makefile that points to the stamp
file path for the actual BLAS implementation spkg selected. Other
spkgs that have BLAS as a dependency list the `$(BLAS)` variable in
its dependencies, rather than writing "openblas" or "atlas"
explicitly.

When openblas is selected (now the default) the openblas spkg is
installed unconditionally.

However, when *atlas* is selected, there happens to be a mechanism for
using a system BLAS (why just with ATLAS I don't know--historical
reasons I guess). In this case it still runs the spkg-install for
ATLAS like for any other spkg, but its spkg-install checks for a
special environment variable, `SAGE_ATLAS_LIB` (the only way to
control this behavior). This invokes a search in standard locations
first for a "libatlas.so" (or equivalent) explicitly. If that's not
found, it will happily take whatever it does find as long as there's
*some* "libblas.so" and "liblapack.so" found on the system. It
doesn't do any feature checks or anything--it just takes what it
finds.

If it does find something resembling either ATLAS specifically, or a
generic BLAS/LAPACK, then it skips installing the actual spkg, but
still writes a stamp file indicating that "ATLAS" was installed, with
whatever version is in the package-version.txt for the spkg, which can
of course be misleading. (It also writes pkgconfig .pc files in
$SAGE_LOCAL/lib for blas/cblas/lapack indicating which libs it found,
along with a "fake" version of "1.0".)

This, Sage will use these system libraries for all build and runtime
requirements of BLAS, and in my experience this has generally worked.

* There is another issue I would like to address--slightly orthogonal
to supporting system packages--of having a regular way to support
"abstract" packages that can have multiple alternative implementations
(another example being GMP/MPIR). This has been talked about before,
such as in this recent thread [3]. I have some ideas about this that
integrate well with my ideas for system packages, but I will try to
save that for a separate message.


b) GCC
------

The GCC spkg is a bit of a different beast, since it is normally not
installed by default, and was only added to support cases where the
platform's GCC is broken or too old and has bugs that affect building
Sage or its dependencies.

Although Sage's `configure` script is responsible for determining
whether or not GCC should be installed (in contrast to hacks in
spkg-install like for ATLAS), there is no *flag* for `configure` (e.g.
--with-gcc or something like that) for controlling this. Instead the
behavior is controlled solely by an environment variable
"SAGE_INSTALL_GCC" (this should probably be fixed, but we'll come to
that). If the environment variable is set to "yes"/"no" then that
forces the gcc installation behavior one way or the other. However,
if the environment variable is not set, then the configure script goes
through the necessary checks to see if the installed gcc is new
enough, and also if gfortran is installed, among others. If GCC
installation is deemed necessary then it sets a flag indicating as
much, called `need_to_install_gcc=yes`.

This is used later (see next section) to set the `$(inst_gcc)` variable.

c) git
------

Sage actually includes an spkg for git, and installs it
unconditionally (there is currently no way to control this) if a
working 'git' is not found on the system. This is one of the few
packages that just has a straightforward check for the system version
at configure time. If a working git is not found (where 'working'
here just means `git --version` works) the script sets a variable
(similar to the gcc case) called `need_to_install_git=yes`.

(It also sets a similar variable for `need_to_install_yasm` on
x86-based systems.)

Later, while writing the main Makefile, the configure script loops
over all spkgs that *might* be installed and checks for a
`need_to_install_<spkg>` variable. If not found, or not set to "no",
the script sets the `$(inst_<spkg>)` variable to point to the standard
stamp file for that package. Otherwise it sets `$(inst_<spkg>)` to a
dummy file that always exists (this way any dependencies for that
package are still satisfied, but the spkg is never actually
built/installed).


4. Package sources
==================

One of the main changes I'm proposing is that stamp files for packages
will always be written to SAGE_SPKG_INST even for cases where the
system package is used, and the Sage spkg is not actually installed.

That is, I want to change the meaning of "spkg" to more broadly
represent "a dependency of Sage that *may* be included in
Sage-the-distribution".

To this end I want to define a concept of spkg "sources" (not to be
confused with source code). Instead, these are sources from which the
spkg dependency can be satisfied. Three possible sources I have in
mind (and I'm not sure that there would be any other):

a) sage-dist: This is the current notion of an "spkg", where the
source tarball is downloaded from one of the Sage mirrors, unpacked
and installed to $SAGE_LOCAL using sage-spkg + the spkg's spkg-install
script. The resulting stamp file, with the version taken from
package-version.txt is written to $SAGE_SPKG_INST.

b) system: In this case a check is made to see if the dependency is
already satisfied by the system. How exactly this check is performed
depends heavily on the package. *If possible* the version of the
system package is also determined (will discuss the nuts-and-bolts of
this later). In this case a stamp file is still written to
$SAGE_SPKG_INST, but indicating somehow that the system package was
used, not the sage-dist package.

c) source: This case is not necessary for supporting system packages,
but I think would be useful for testing new versions of a package. In
this case it would be possible to install an spkg from an existing
source tree for that package, which would be installed using the
spkg-install script. If possible the version number would be
determined from the package source code, and not assumed. I think
this would be useful, but won't discuss this case any further for now.
I just point it out as another possibility within this framework of
allowing different spkg "sources".

To summarize, no matter how an spkg dependency is satisfied, a stamp
file for that spkg is written to $SAGE_SPKG_INSTALL, possibly
indicating the *actual* version of the package being used by Sage, and
indicating how the dependency was satisfied.


5. Nuts and bolts
=================

a) New stamp file format
------------------------

As suggested in the previous section, no matter how an spkg dependency
was satisfied, a stamp file is written to the $SAGE_SPKG_INST
directory. In order to support multiple possible package "sources",
the source that was used should be included in the stamp file. This
way, it will also be possible to re-run `./configure` and specify a
different source for a package, thus forcing a rebuild. So I think
the stamp filename format should be something like:

$SAGE_SPKG_INST/<name>-<source>-<version>

where <name> would be the base package name, <source> would be
something like "sagedist" or "system", and <version> the *actual*
version of the package being used. I'll discuss in the next section
how this might be determined for system packages. There's plenty of
room for bikeshedding in this, but I think this makes sense. We could
also support the old filename format, if such files are found, for
backwards compatibility.


b) Checking packages
--------------------

For any dependency that may be satisfied by system packages, there
needs to be a way to specify what the minimum dependency is for Sage
(be it a version number, or the presence of certain features) there
needs to be a way for each package to check that the dependency is
satisfied.

I've gone back and forth on exactly how this should be done, but I
think that the best way to do this is to allow per-package m4 files,
containing an m4 macro that checks that dependency on that package is
satisfied (again, be it version number or some other check). Each
macro could be named something like

SAGE_SPKG_CHECK_<name>

Optionally the macro should set a variable indicating the package
*version* if the package dependency is satisfied. This is the version
string that can be used in the stamp file, for example. If there is
no clear way to determine the version (though it most cases there will
be), a string like "unknown" could still be allowed for the version.
The macro would be defined in a file like sage_spkg_check.m4 under
each build/pkgs/<spkg> directory, and loaded on an as-needed basis
using the m4_include command in configure.ac.

Writing an m4 macro for autoconf is not a common skill, which is why
I've hesitated on this. But I think it has a few justifications: It
allows one to take advantage of the many existing macros that come
with autoconf to perform common checks, such as whether a program is
installed, or a function is available in a library. For many packages
the SAGE_SPKG_CHECK_ macro would probably just wrap one or two
existing autoconf macros. Another justification is that for some
packages there may be existing macros to check for them that we can
borrow from other projects.

We can also provide, in the documentation, a simple template macro
demonstrating how to wrap a few shell commands.

*NOTE*: To be clear, I'm not proposing that, to implement this
proposal, we go through and write 250+ m4 macros for every Sage spkg.
This check will be optional, and we can write them one at a time on an
as-needed basis, starting with some of the most important ones. I'll
discuss more about how missing checks are handled in the next section.

Obviously the packages that already have checks in configure.ac (gcc,
git, yasm) would have those checks moved out to their package-specific
macros.


c) Driving the system
---------------------

As previously noted, selecting the source for a package would be done
at ./configure time. My proposal would be to change very little about
the current default behavior.

By default, all packages would be installed from the sage-dist source
as is the case now. We could still make exceptions for build
dependencies like gcc and git. I don't care whether these exceptions
are hard-coded in configure.ac, or specified in some generic way.

However, the configure script would support, for all spkgs, a
`--with-system-<spkg>` argument (e.g. `--with-system-zlib`).

For each spkg to be installed (all standard packages, optional
packages if selected), if the `--with-system-<spkg>` argument is
given, it will attempt to load and run the SAGE_SPKG_CHECK_<spkg>
macro for that package. If the macro is not defined, there would be a
*warning* that system package was selected for that package, but there
is no way to check if it was installed. The warning would make clear
that if the build fails it may be due to this dependency being
missing. Otherwise it runs the check, and if the check succeeds the
configure script would continue, while if the check fails the
configure would stop with an error.

Optionally, we could add arguments to control all of this behavior.
For example, it might be useful to have an option to install the
sage-dist spkg if a check is not defined. This might even be better
as the default--a possible bikeshed issue.

Another possible option is one that enables system packages, but
disables any checks. This might be useful for system packagers who
already have external guarantees that the dependencies have been met.

Finally, there should be an option like `--with-system-all` to
automatically use system packages for all dependencies, so that
downstream packagers don't have to supply hundreds of `--with-system-`
flags.

Otherwise, generation of the build/make/Makefile by the configure
script would proceed more or less as it does currently. It would just
take into account information gained through any `--with-system-`
flags to generate the new format stamp filenames. The .dummy stamp
file would not be used anymore. Also, the rule for building system
packages would be to simply write the stamp file.


6. Q&A
=====

Q: What if I install with --with-system-<spkg> but later want to
install the sage-dist version of that package?

A: We should also support some way to deselect system packages.
Perhaps --without-system-<spkg> / --with-system-<spkg>=no (these are
two ways of saying the same things in standard configure scripts).

Q: The reverse: What if I install the sage-dist package, but want to
switch to the system package?

A: Same thing, but this is a little trickier because we would need to
*uninstall* the package from $SAGE_LOCAL. I have a proposal for
improving spkg uninstallation written up at
https://trac.sagemath.org/ticket/22510

Q: What if I use a system package when building Sage, but that package
is later upgraded, or worse, removed?

A: There's no great solution to this. Certainly, I think the
./configure time checks should be cached (since updates are not
usually *that* frequent). So there needs to be good documentation on
invalidating the cache when re-running ./configure. Still, that only
helps with configure-time detection. Sage can still break at runtime
if a system package it depends on changes. This is a generic problem
for *any* software development, however, and something developers
should be aware if if they're updating their system. Granted, most
people don't always closely examine what's changing when they install,
for example, OS updates. I certainly don't always check this with a
fine-toothed comb. But it's a general issue. Keeping the ability to
install the "standard", known-working sage-dist spkgs if needed is
also a big advantage of this proposal.

Any other questions?


7. Future concepts
==================

a) Platform hooks
-----------------

It might be nice, when using system packages, for the underlying
OS/distribution system to hook into the SAGE_SPKG_CHECK_ system, both
to check if a package is installed, and to provide its version number.
For example, when building Sage on Debian, it might just hook into the
dpkg system to provide this information in a manner consistent with
the system.

b) Abstract packages
--------------------

Returning to the question of dependencies that can be satisfied by
more than one package (e.g. BLAS, GMP), I think it would be nice to
have a generic way of handling such cases that's a little cleaner than
the current ad-hoc system. I would like a way of specifying an
"abstract" package (which might be named "blas", for example).
Installing an abstract package would mean installing the concrete
package selected to satisfy it, but it would also include a system for
switching between concrete implementations. So for example it would
be possible to have multiple BLAS implementations installed
simultaneously, and installing "blas" with the current selection might
just be a matter of updating some symlinks.

I think this concept fits in well with the proposal for handling
system packages, but doesn't necessarily need to be handled
simultaneously with it. For now we can just maintain the special
cases I think...


8. Conclusion (for now)
=======================

I've heard many valid concerns with going beyond sage-the-distribution
for building/running Sage. Sage's huge collection of dependencies can
lead to many fragilities: Version X of package Y might work with
dependency A, but completely break dependency B. And supporting
versions V, W, and X of package Y simultaneously is a lot of overhead
compared to always just using version Y of that package for Sage.

I do personally have a preference, when it comes to writing software,
to supporting as wide a range of versions for my dependencies as is
feasible. For some dependencies the versions supported may,
necessarily, be very narrow. But for other cases there can be a lot
more room for flexibility.

Regardless, I think this proposal maintains the current stability of
Sage by keeping the current preference for sage-the-distribution in
all cases by default. It also maintains the ability to use
custom-built versions of some of Sage dependencies. But I think this
will also provide more flexibility in experimenting with using
existing system packages in cases where that's sufficient, and avoid
Sage duplicating system packages unnecessarily.

Best,
Erik


[1] https://trac.sagemath.org/ticket/14405
[2] https://www.technovelty.org/tips/the-stamp-idiom-with-make.html
[3] https://groups.google.com/d/msg/sage-devel/8MJBe_qxWJ0/fTzOPVzDAAAJ

Kwankyu Lee

unread,
May 26, 2017, 10:34:18 AM5/26/17
to sage-devel, sage-pa...@googlegroups.com
I wonder if the recent discussion of possible schemes about how to build Sage for python 2 and 3 (one may propose (or object to) that Sage should be built to be runnable both for python2 and python3) could be related with your proposal. 

Is it simply orthogonal with your proposal or somehow could be affected by your proposal?   

William Stein

unread,
May 26, 2017, 5:49:03 PM5/26/17
to sage-devel, sage-pa...@googlegroups.com
On Fri, May 26, 2017 at 6:01 AM, Erik Bray <erik....@gmail.com> wrote:
[...]
> The extent and scope to which Sage "vendors" its dependencies, in the
> form of what some call "sage-the-distribution", is *not* particularly
> normal in the open source world. Vendoring *some* dependencies is not
> unusual, but Sage does nearly all (even down the gcc, in certain
> cases). I've learned a lot of the history to this over the past year,
> and agree that most of the time this has been done with good reasons.
>
> For example, I can't think of any other software that forces me to
> build its own copy of ncurses just to build/install it. This was

Maybe Anaconda?

https://anaconda.org/anaconda/ncurses

The approach in Sage is indeed very rare, but it's interesting that
another similar situation is with another big Python computing stack
(Anaconda), which was developed independently. In any case, it's
worth mentioning Anaconda in the proposal.

-- William

Dima Pasechnik

unread,
May 27, 2017, 2:06:59 AM5/27/17
to sage-devel
Another system that takes the Sage-like "distribution" approach, and is worthwhile to have a look to see how they package things, is Macaulay2.

However, they have a serious technical reason for such an approach, as they need compatibility with Boehm GC, and many libraries they need typically have to rebuilt to ensure this.

Dima

Erik Bray

unread,
May 27, 2017, 3:49:31 AM5/27/17
to sage-pa...@googlegroups.com, sage-devel
I didn't really think of Anaconda since it's not a single software application like Sage. As a self-contained packing system of course most packages included in Anaconda are of this nature. But of course Anaconda doesn't necessarily distribute *everything* it needs to work either. ncurses sort of makes sense to include since a buggy ncurses can be a real deal-breaker for UX. 

My favorite bugbear in Sage is its inclusion of patch. I wince every time I start a fresh build of Sage and it starts compiling patch. I'm sure there was a good reason for it, and it's very minor in the grand scheme of things, but I think points quite obvious for a need to have better control over what we can use from the system. 

Best, 
Erik 

Volker Braun

unread,
May 27, 2017, 5:13:04 AM5/27/17
to sage-devel, sage-pa...@googlegroups.com
In fact, if we were to do some major changes to the build system we should consider building on top of conda. In particular, we shouldn't just crap arbitrary files into $SAGE_LOCAL during build, but turn each package into separate binary achive that then gets installed. 

* Going back in the git history then involves no recompilation, only re-extracting the cached binaries.
* You can decide on a per-package level if you want to (re-)compile it or use a binary package
* The whole thing can just be published as a conda channel, just run "conda install --channel https://sagemath.org sage"
* Incremental binary updates for free
* We could build on conda (-forge) instead of maintaining our own patch, python, ... packages.
* A conda build recipe is just a better version of how we currently define packages (spkg-install -> build.sh, metadata in meta.yaml instead of scattered into multiple files)

Isuru Fernando

unread,
May 27, 2017, 5:14:24 AM5/27/17
to sage-pa...@googlegroups.com, sage-devel
Have a look at spack as well, which is a package-manager. Although it's not a single software application, it uses system packages when specified to build a package.


Isuru Fernando

--
You received this message because you are subscribed to the Google Groups "sage-packaging" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-packaging+unsubscribe@googlegroups.com.
To post to this group, send email to sage-packaging@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sage-packaging/CAOTD34b53fuFHpPNXpMVuVj2R9m-YFAbg%3DjqF1QPDNveycz9Ag%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Jeroen Demeyer

unread,
May 27, 2017, 5:33:00 AM5/27/17
to sage-...@googlegroups.com, sage-packaging
> Otherwise it sets `$(inst_<spkg>)` to a
> dummy file that always exists (this way any dependencies for that
> package are still satisfied, but the spkg is never actually
> built/installed).

Let me mention *why* I came up with this dummy file: even if configure
detects that a Sage package is not needed, it can still be explicitly
installed by

sage -i PKGNAME # This is essentially the same as "make PKGNAME"

If I understand your proposal, if a system package is used, sage -i
PKGNAME will *not* install the Sage package since the "spkg" is
satisfied by the system package.

Personally, I find it more intuitive if "sage -i PKGNAME" would
unconditionally install the Sage package PKGNAME, even if PKGNAME was
detected as system package.

> By default, all packages would be installed from the sage-dist source
> as is the case now.

I wonder why you propose this. The reason why we check for gcc for
example is because we want to avoid building the Sage package if we can.
If you go to the trouble of adding a check for system packages, the
default should be to *not* install the Sage package if the system
package works.

Apart from these two points, I totally agree with your post. Now to find
a volunteer to implement all that :-)


Jeroen.

Erik Bray

unread,
May 27, 2017, 7:16:42 AM5/27/17
to sage-pa...@googlegroups.com, sage-devel
On May 27, 2017 11:13 AM, "Volker Braun" <vbrau...@gmail.com> wrote:
In fact, if we were to do some major changes to the build system we should consider building on top of conda. In particular, we shouldn't just crap arbitrary files into $SAGE_LOCAL during build, but turn each package into separate binary achive that then gets installed. 

Can I just rein things back in here for a sec? I shouldn't have posted this on Friday--I don't want to get into any detailed discussions until I'm back at work on Monday :) 

I just wanted to say, that the whole point of what I'm proposing is that it's *not* a major change. I agree in principle with everything you're saying and would be happy to talk about bigger changes in a separate context. 

All I'm proposing are some very *minor* changes that change little about how Sage is currently worked on, while still being a quality of life improvement, in a way. 

In other words, it's something I can do now with maybe a few days of work instead of a major overhaul of everything. So I'd rather this thread focus on the details of those minor changes than any big ideas that may or may not go anywhere. 



* Going back in the git history then involves no recompilation, only re-extracting the cached binaries.
* You can decide on a per-package level if you want to (re-)compile it or use a binary package
* The whole thing can just be published as a conda channel, just run "conda install --channel https://sagemath.org sage"
* Incremental binary updates for free
* We could build on conda (-forge) instead of maintaining our own patch, python, ... packages.
* A conda build recipe is just a better version of how we currently define packages (spkg-install -> build.sh, metadata in meta.yaml instead of scattered into multiple files)




On Friday, May 26, 2017 at 11:49:03 PM UTC+2, William wrote:
On Fri, May 26, 2017 at 6:01 AM, Erik Bray <erik....@gmail.com> wrote:
[...]
> The extent and scope to which Sage "vendors" its dependencies, in the
> form of what some call "sage-the-distribution", is *not* particularly
> normal in the open source world.  Vendoring *some* dependencies is not
> unusual, but Sage does nearly all (even down the gcc, in certain
> cases).  I've learned a lot of the history to this over the past year,
> and agree that most of the time this has been done with good reasons.
>
> For example, I can't think of any other software that forces me to
> build its own copy of ncurses just to build/install it.  This was

Maybe Anaconda?

https://anaconda.org/anaconda/ncurses

The approach in Sage is indeed very rare, but it's interesting that
another similar situation is with another big Python computing stack
(Anaconda), which was developed independently.  In any case, it's
worth mentioning Anaconda in the proposal.

 -- William

--
You received this message because you are subscribed to the Google Groups "sage-packaging" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-packaging+unsubscribe@googlegroups.com.
To post to this group, send email to sage-packaging@googlegroups.com.

Erik Bray

unread,
May 27, 2017, 7:26:41 AM5/27/17
to sage-devel, sage-packaging
On May 27, 2017 11:32 AM, "Jeroen Demeyer" <jdem...@cage.ugent.be> wrote:
Otherwise it sets `$(inst_<spkg>)` to a
dummy file that always exists (this way any dependencies for that
package are still satisfied, but the spkg is never actually
built/installed).

Let me mention *why* I came up with this dummy file: even if configure detects that a Sage package is not needed, it can still be explicitly installed by

sage -i PKGNAME    # This is essentially the same as "make PKGNAME"

If I understand your proposal, if a system package is used, sage -i PKGNAME will *not* install the Sage package since the "spkg" is satisfied by the system package.

Personally, I find it more intuitive if "sage -i PKGNAME" would unconditionally install the Sage package PKGNAME, even if PKGNAME was detected as system package

I'll respond in more detail later but I agree with you completely here. 


By default, all packages would be installed from the sage-dist source
as is the case now.

I wonder why you propose this. The reason why we check for gcc for example is because we want to avoid building the Sage package if we can. If you go to the trouble of adding a check for system packages, the default should be to *not* install the Sage package if the system package works.

This is the kind of detail that I think is arguable and why I wanted to write a long message explaining it :) 

The reason I proposed this was just to change as little as possible about the current behavior (something like GCC would be a special case). Another reason is that checking for every package would make configure take a lot longer, though I do think the results of those checks should be cached, so maybe most of the time it would not be too bad. 

I'd be fine with going either way. The --with-system-all option would basically be the behavior of checking for every package. The question is whether it should error if a package isn't found, or just build the sage-dist package. (Or there can be a flag for that behavior in which case the question is what the default should be). 


Apart from these two points, I totally agree with your post. Now to find a volunteer to implement all that :-)

I'm volunteering! I just wanted to run the idea by people in detail first. 


Erik

Isuru Fernando

unread,
May 27, 2017, 7:52:00 AM5/27/17
to sage-pa...@googlegroups.com, sage-devel
On Sat, May 27, 2017 at 4:46 PM, Erik Bray <erik....@gmail.com> wrote:
On May 27, 2017 11:13 AM, "Volker Braun" <vbrau...@gmail.com> wrote:
In fact, if we were to do some major changes to the build system we should consider building on top of conda. In particular, we shouldn't just crap arbitrary files into $SAGE_LOCAL during build, but turn each package into separate binary achive that then gets installed. 

Can I just rein things back in here for a sec? I shouldn't have posted this on Friday--I don't want to get into any detailed discussions until I'm back at work on Monday :) 

I just wanted to say, that the whole point of what I'm proposing is that it's *not* a major change. I agree in principle with everything you're saying and would be happy to talk about bigger changes in a separate context. 

All I'm proposing are some very *minor* changes that change little about how Sage is currently worked on, while still being a quality of life improvement, in a way. 

If you can do this, then it'll be simple to use conda. All the dependencies are in `conda-forge` for linux (osx needs only a cython upgrade), so it's a matter of telling the build system to use the conda packages as system ones.

Isuru Fernando
 
To post to this group, send email to sage-pa...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "sage-packaging" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-packaging+unsubscribe@googlegroups.com.
To post to this group, send email to sage-packaging@googlegroups.com.

Simon King

unread,
May 27, 2017, 8:55:13 AM5/27/17
to sage-...@googlegroups.com
Hi,

On 2017-05-27, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> Personally, I find it more intuitive if "sage -i PKGNAME" would
> unconditionally install the Sage package PKGNAME, even if PKGNAME was
> detected as system package.

I find it more intuitive if "sage -f PKGNAME" would install the Sage package
even when either the spkg or a system package is present, whereas I think
"sage -i PKGNAME" should only install the package if it isn't available
yet ("available" currently means "available as spkg", but should in future
mean "available either as spkg or system-wide").

Cheers,
Simon

Felix Salfelder

unread,
May 27, 2017, 9:53:48 AM5/27/17
to sage-...@googlegroups.com
On Sat, May 27, 2017 at 01:16:39PM +0200, Erik Bray wrote:
> All I'm proposing are some very *minor* changes that change little about
> how Sage is currently worked on, while still being a quality of life
> improvement, in a way.

Hi Erik.

that's great news.

and it sounds like the way to go, particulary in contrast to what i
tried four years ago. (TL;DR; that was a demo of a modified sagelib
that worked on a modified sage-the-distribution, as well as on debian --
i made choices, likely too pragmatic, and it went all bikeshed).

> In other words, it's something I can do now with maybe a few days of work
> instead of a major overhaul of everything. So I'd rather this thread focus
> on the details of those minor changes than any big ideas that may or may
> not go anywhere.

the place where I would start today is just the blacklists. i.e. have
toplevel configure flags that allow telling sage-the-distribution not to
build spkgs. rather pretend they are "installed" (into $SAGE_LOCAL,
as usual) to all other parts.

something similar to
$ ./configure --disable-patch --disable-ncurses

will then effectively fallback to system packages without much more
work. note how 4 years back, even the use of PATH was highly
controversial. i reckon the situation has improved.

the blacklist method will enable anyone (i am thinking of power-users
and distributors, conda fans, or myself), to try, and send patches...
some of them will be needed.

i see the point of having all sorts of magic to determine whether or not
a system package substitutes an spkg. (and i partly did some of that kind
myself). i do now consider that pointless, way too much work. imo,
nowadays, functionality checks should be on package level, and a
transition path close to

- provide blacklists
- switch to system packages, one by one, fix the remaining
- eventually reach "./configure --disable-all --enable-sagelib"
- use standalone (vanilla) sagelib on gentoo, conda, debian, etcpp
- ditch sage-the-distribution for (something similar to) conda

seems feasible. at the end, it does not matter much, what spkg -i
<name_of_disabled_package> might have done during the transition. keep
it simple, print a warning "not supported", and exit 1.

regards
felix

Erik Bray

unread,
May 29, 2017, 8:37:07 AM5/29/17
to sage-pa...@googlegroups.com, sage-devel
On Sat, May 27, 2017 at 1:51 PM, Isuru Fernando <isu...@gmail.com> wrote:
> On Sat, May 27, 2017 at 4:46 PM, Erik Bray <erik....@gmail.com> wrote:
>>
>> On May 27, 2017 11:13 AM, "Volker Braun" <vbrau...@gmail.com> wrote:
>>
>> In fact, if we were to do some major changes to the build system we should
>> consider building on top of conda. In particular, we shouldn't just crap
>> arbitrary files into $SAGE_LOCAL during build, but turn each package into
>> separate binary achive that then gets installed.
>>
>>
>> Can I just rein things back in here for a sec? I shouldn't have posted
>> this on Friday--I don't want to get into any detailed discussions until I'm
>> back at work on Monday :)
>>
>> I just wanted to say, that the whole point of what I'm proposing is that
>> it's *not* a major change. I agree in principle with everything you're
>> saying and would be happy to talk about bigger changes in a separate
>> context.
>>
>> All I'm proposing are some very *minor* changes that change little about
>> how Sage is currently worked on, while still being a quality of life
>> improvement, in a way.
>
>
> If you can do this, then it'll be simple to use conda. All the dependencies
> are in `conda-forge` for linux (osx needs only a cython upgrade), so it's a
> matter of telling the build system to use the conda packages as system ones.

Hi Isuru,

Exactly! With these changes, building Sage in any environment is the
same as before. But if building in a conda environment (just as an
example) this is a way to get it to use the conda
packages--selectively if needed. It can also be useful for finding
out what packages are missing, or don't quite meet Sage's needs.

If we wanted to go further one day and replace Sage's old build system
entirely with something based on Conda that's a good thing to talk
about (as we already have before), but this is a small stepping stone
to that.

Erik
>> email to sage-packagin...@googlegroups.com.
>> To post to this group, send email to sage-pa...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sage-packaging/a9a640d6-6fec-40b0-b8f2-249c3dae10db%40googlegroups.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sage-packaging" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sage-packagin...@googlegroups.com.
>> To post to this group, send email to sage-pa...@googlegroups.com.
>> To view this discussion on the web visit
> --
> You received this message because you are subscribed to the Google Groups
> "sage-packaging" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-packagin...@googlegroups.com.
> To post to this group, send email to sage-pa...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sage-packaging/CA%2B01voPzQvGzEsVvD5o%2BJAgfRiYyQnGiWZHSwKQD5ZzAXUOyMw%40mail.gmail.com.

Erik Bray

unread,
May 29, 2017, 8:48:33 AM5/29/17
to sage-packaging, sage-devel
On Sat, May 27, 2017 at 11:07 PM, Ximin Luo <infi...@debian.org> wrote:
> (Sorry, I am not on sage-devel@ due to volume so my posts won't get delivered there. Is there a way to whitelist cross-posters to both sage-devel@ and sage-packaging@ to be able to show up in the former?)
>
> Erik Bray:
>> Hi folks interested in Sage packaging,
>>
>> Almost every time the topic comes up, I complain that it isn't easier
>> to use more system packages as both build- and run-time dependencies
>> of Sage. [..]
>
> Nobody has mentioned patches yet. Sage patches both 3rd-party mathematics packages as well ass system / generic packages in ways that are not compatible with other users of these packages.
>
> It even patches Python, and we had to patch the dochtml builder in Debian [1] otherwise it segfaults with an unpatched Python. Most recently, I had to patch Sage to hack ipywidgets at runtime [2] because I can't distribute Sage's patched ipywidgets, since upstream has not yet accepted the patch. [3] There are quite some more cases, you can look through the full set of Debian patches for details. [4]

Right. And as I understand it, there are Debian packages
*specifically* for some of the Sage-specific versions of some of these
packages (such as Pari). Or at least that was the approach taken at
one point--I don't know if that's still true, or if the necessary
patches have made it into the main Debian packages.

The point with my approach right now is not to tackle replacing *all*
of Sage's dependencies with system packages. This is why it wouldn't
even be enabled by default, for now, without explicitly asking for it.
It's more of a convenience for developers so that we can more easily
test against the system packages (and maybe, before long, prefer them
by default where possible). It should also be helpful for system
packagers in this regard. My primary motivation, for example, is that
building on Cygwin is somewhat slower than on other platforms and it
would be great to use more system packages where possible.

> In Debian we use some autoconf/m4 scripts [5] to hack the Sage spkg Makefiles to use Debian's system packages instead. Feel free to take ideas from this. However, if you are serious about supporting this mode of installation, you need to *actually test it* in your continuous integration infrastructure, to keep it working. This is true especially if you are going to keep patching third-party packages in the incompatible way that you're doing now.

That's the plan.

> [1] makes the build completely fail, it took me a non-trivial amount of time to figure this out. [2] breaks 68 tests, you *could have* done it differently in Sage, in a way that doesn't mess with the system-level ipywidgets - e.g. what I did in the Debian version of that patch. Yes it's less elegant but that's the price you pay if you want to work with 3rd-party software in a nice well-integrated distribution.
>
> OTOH, if you don't want to bother testing this, then I'd ask what is the point of making it easier to "potentially" use system packages but in practice this breaks all the time.

See above.

> You could also just say "we will never patch packages like gcc, A, B, C, D" and make this a static list, and keep patching other packages. We haven't seen many breakages along these lines, with the exception of the Python patch I just mentioned.

I don't want to make any such proclamations right now. In fact, I'm
trying to *get rid of* any special cases as much as possible (though
there are still a small number). All this is about is just the
technical mechanism for avoiding installing Sage-dist's packages where
desired.

Best,
Erik

kcrisman

unread,
May 30, 2017, 9:00:35 AM5/30/17
to sage-devel
I don't think this proposal would affect things too drastically, but just remember that ideally a Mac (not just Linux) user can just download source (or use git to do so), type make, wait a while, and still have a usable sage-the-distribution (once they've installed command line tools, which should be the only prereq for them). Certainly this is one reason many of the packages were historically included (including gcc, as you are no doubt aware).  I'm impressed if you can make this happen, though, as many Linux fans will be very happy.

Erik Bray

unread,
May 31, 2017, 6:40:35 AM5/31/17
to sage-devel
Do you know if there's ever been much luck building Sage with packages
from MacPorts, Homebrew, or the like? In theory this would help with
that too, but I don't know much about it.

Isuru Fernando

unread,
May 31, 2017, 7:52:25 AM5/31/17
to sage-devel
Without switching to clang, this will require the dependencies especially the C++ dependencies to be rebuilt with gcc. So, you'll need to make sure the C++ library built with gcc is not selected by sage packages and only the sage built library by using that library. For eg. assume you use gmp from homebrew (assuming libgmpxx is not used by any sage package) and you also have fplll installed. You'll be compiling fpylll with gmp from homebrew and fplll from SAGE_LOCAL and also have to ignore fplll from homebrew.

If you decide to go with clang, then it needs a unmerged ticket from sage. (You'll have to build cython as well because there is a patch used by sage not in a release).

For homebrew, there are some packages with missing dependencies, for eg. ntl is not compiled with gf2x, flint is not compiled with ntl. The build system will have to check these.


Isuru Fernando


--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscribe@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.

David Leach

unread,
May 31, 2017, 12:53:55 PM5/31/17
to sage-devel, sage-pa...@googlegroups.com
Hi,
       A lot of issues here, but can I suggest maybe you take a look at using the Guix functional package manager as a way of keeping track of your dependency tree and for example using different versions of python concurrently etc. It just might be a great help in development with a lot of the issues discussed here. It is developed by a Ludovic Courtès who works at INRIA, a place where some of your contributors work at I notice.
Cheers.

vdelecroix

unread,
Jul 31, 2017, 12:29:38 PM7/31/17
to sage-devel, sage-pa...@googlegroups.com
Hi Eric,

Currently at a workshop in Leiden [1] we figured out one another possible use case for your proposal. Some people does develop PARI/GP in parallel of Sage. One simple way to have a testing environment would be to have:
 * a git repo for PARI/GP
 * a git repo for SAGE
 * telling SAGE to use the development version PARI/GP (wherever it is installed)

Though, it triggers one question: how one would relaunch the chain compilation due to a PARI/GP update? Would it be automatically handled by the Makefile? (the same question holds for system packages of course)

Best
Vincent

 [1] https://www.universiteitleiden.nl/en/events/2017/07/workshop-on-algorithms-in-number-theory-and-arithmetic-geometry/

Erik Bray

unread,
Aug 1, 2017, 9:16:04 AM8/1/17
to sage-packaging, sage-devel
On Mon, Jul 31, 2017 at 6:29 PM, vdelecroix <20100.d...@gmail.com> wrote:
> Hi Eric,
>
> Currently at a workshop in Leiden [1] we figured out one another possible
> use case for your proposal. Some people does develop PARI/GP in parallel of
> Sage. One simple way to have a testing environment would be to have:
> * a git repo for PARI/GP
> * a git repo for SAGE
> * telling SAGE to use the development version PARI/GP (wherever it is
> installed)

Yes, this is exactly the kind of use case I had in mind for a "source"
origin for Sage packages (for myself, I wanted to do something
similar, but with Singular, and found it to be currently a bit more
trouble than it should be).

> Though, it triggers one question: how one would relaunch the chain
> compilation due to a PARI/GP update? Would it be automatically handled by
> the Makefile? (the same question holds for system packages of course)

It sort of depends on what you mean by "a PARI/GP" update. The
general idea here is that you would install pari in Sage sort of more
or less the same way one does now (although currently one almost never
does so manually since it's a required standard package).

The basic idea is that you would configure (either via the configure
script or some other means) the origin of the "pari" spkg to be a
source tree, and provide the full path to where the pari source code
is. When pari is installed (or reinstalled) it would install it more
or less the way it installs the standard spkg. The main difference in
this case being that instead of downloading and extracting a source
tarball, it would build/install from the source tree already provided.
This would be using the same spkg-install script that it would use
normally, so it's *possible* that if you're trying to develop Sage
against a development version of PARI that the existing spkg-install
script won't work. In this case one needs to make a branch in Sage in
order to make the necessary edits to the pari spkg. But this is good,
because one would want to save those edits anyways in anticipation of
eventually updating the spkg.

Once pari is installed into Sage everything else works the same, so it
would still trigger possible rebuilds of any spkgs that depend on
pari, via Sage's Makefile.

Does that make sense / seem useful?

Erik

(P.S. I am still working on the plan I described in this thread, or
something close to it, but there's been some preliminary work I've
needed to do first on making installation/uninstallation of Sage
packages more idempotent; see https://trac.sagemath.org/ticket/22510
and its dependencies)
> --
> You received this message because you are subscribed to the Google Groups
> "sage-packaging" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-packagin...@googlegroups.com.
> To post to this group, send email to sage-pa...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sage-packaging/4897e22f-c3d2-4ba5-8a88-aada683197e4%40googlegroups.com.

Vincent Delecroix

unread,
Aug 2, 2017, 3:07:12 AM8/2/17
to sage-...@googlegroups.com
I see. I was precisely annoyed by using the "sage -i pari" since with a
development version this is very likely to be broken because of patches
that would not apply anymore. I was thinking of a "manual" installation
of pari in Sage followed by a "rebuild everything that depends on PARI".
Fixing the install script (and patches) might be the best thing to do.

I am specifically talking about PARI/GP because I am updating the Python
interface cypari2 [1] to be complient with any version of PARI/GP. With
this feature it should be possible for people to use PARI/GP development
versions.

I am really looking forward to have this possibility in Sage!

Vincent

[1] https://github.com/defeo/cypari2/pull/28

Nils Bruin

unread,
Aug 2, 2017, 9:27:33 AM8/2/17
to sage-devel
Does sage need a purpose-built pari/gp? Ideally, configuring sage to use an external pari/gp would mean that sage just compiles/links using externally provided header files and libpari.so (and perhaps know where the gp executable is provided).

When sage is being rebuilt, it could take the modification dates of include/pari/* and libpari.so as a cue to rebuild the modules that depend on those. It could keep a timestamp file itself to detect this. Surely cypari has pari as a dependency, and doesn't supply it itself?
 
Then someone who wants to develop both pari and sage could point sage at the local development pari and use sage -b or make build to bring sage in line with the local pari developments.
Reply all
Reply to author
Forward
0 new messages