Refactoring tasks in the Sage library to support modularization

151 views
Skip to first unread message

Matthias Koeppe

unread,
Sep 26, 2021, 1:44:14 PMSep 26
to sage-devel
1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

This pattern in our Sage library code blocks modularization. In particular, imports from sage.rings.all, sage.modules.all, sage.misc.all, sage.categories.all, sage.matrix.all must be avoided because sage.modules, sage.misc, sage.categories, sage.matrix are slated to become namespace packages (see https://trac.sagemath.org/ticket/32501). I have done this for sage.geometry in https://trac.sagemath.org/ticket/32534, but help with this throughout the Sage library is needed.

2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions

Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

3. Break up Cython modules that depend on several C/C++ libraries simultaneously

An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).


Any contributions to these tasks are very welcome. For a roadmap of the modularization project, see https://trac.sagemath.org/ticket/29705

Nils Bruin

unread,
Sep 26, 2021, 4:10:21 PMSep 26
to sage-devel
On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:
1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

I'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.

One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification. I think that has served us well in the past on some occasions. I'm not sure we want to surrender that option if we don't have to.
 
2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions

Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

Have you considered performance implications of that? It would look to me that abstract base classes require more work to determine membership than a plain MRO lookup. I think this "rule" probably needs a little more cautious formulation. I suspect there may be good reasons to do specific isinstance(...,CLASS) tests. Perhaps this is captured in your "...just to run...". I'm not sure about performance issues compared to "is_CLASS" functions, but clearly banning them should be accompanied by an assessment of why their replacement by "isinstance(...)" is not a (performance) regression.

3. Break up Cython modules that depend on several C/C++ libraries simultaneously

An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

The particular file you point to looks like it deserves to be restructured, but the way your objective is structured seems definitely flawed. I just wrote a cython module that essentially depends on both the mpfr and the pari libraries (on C-level!). I don't see how that would be broken up and why it would even be bad to link a cython module against several libraries at once.

Matthias Koeppe

unread,
Sep 26, 2021, 4:26:21 PMSep 26
to sage-devel
On Sunday, September 26, 2021 at 1:10:21 PM UTC-7 Nils Bruin wrote:
On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:
2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions

Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

Have you considered performance implications of that? It would look to me that abstract base classes require more work to determine membership than a plain MRO lookup. I think this "rule" probably needs a little more cautious formulation. I suspect there may be good reasons to do specific isinstance(...,CLASS) tests. Perhaps this is captured in your "...just to run...". I'm not sure about performance issues compared to "is_CLASS" functions, but clearly banning them should be accompanied by an assessment of why their replacement by "isinstance(...)" is not a (performance) regression.

Replacing is_CLASS(...) by isinstance(..., CLASS) has been quietly happening since Sage 6.4 (see https://trac.sagemath.org/ticket/12824 and pointers to tickets in https://trac.sagemath.org/ticket/32414).
What is new in this proposal is that isinstance is to be called not with CLASS but with a new base class of CLASS. Whether this incurs a performance penalty that we need to worry about is a good question. The answer to it may depend on the circumstances.
Work on tickets for https://trac.sagemath.org/ticket/32414 will certainly need careful work and review. This is why I am asking for help with this -- it's not a 1-person job.



 

Matthias Koeppe

unread,
Sep 26, 2021, 4:32:28 PMSep 26
to sage-devel
On Sunday, September 26, 2021 at 1:10:21 PM UTC-7 Nils Bruin wrote:
On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:
1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

I'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.

Such more dynamic approaches are certainly possible. But I think they will create trouble with static type checkers. See discussion in https://trac.sagemath.org/ticket/30647, for example.
 
One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification.

This does not describe our current practice or policy. All public names in all Python modules are part of the API; when we move anything around, we use deprecation via lazy_import etc.
 
Also, I think it's pretty clear that the Sage library does not currently use imports from .all in a systematic way. So I don't think anything is lost.

Nils Bruin

unread,
Sep 26, 2021, 4:53:38 PMSep 26
to sage-devel
On Sunday, 26 September 2021 at 13:32:28 UTC-7 Matthias Koeppe wrote:

This does not describe our current practice or policy. All public names in all Python modules are part of the API; when we move anything around, we use deprecation via lazy_import etc.

I think deprecation only gets used for things exposed in the "sage global namespace", i.e., sage.all. That certainly used to be the case. Other changes are/used to be fair game. We need to keep sage.all in some form, because it's pretty essential that people can "write sage" (minus preprocessor) by doing a "from sage.all import *". I don't think we'll get an alternative for that  [I agree that libraries should not be doing that], because there needs to be an option for people to get their code working without having to track down a dozen import statements.

There's a reason sage has a heavily populated global namespace (and moving code from using it to explicit imports is a very annoying step. "import_statements" helps a lot for it, but figuring out what to run it on has been an iterative  "test it and see what breaks" process in my experience.

Recent example of "rough" deprecation: from 9.2 to 9.3: complex_number.pxd just disappeared. [not something that "all" would help against, by the way]. As a result, it's not easily possible to write a cython extension that uses it and supports both 9.2 and 9.3. I think these are the kind of things that will limit the modularization of sagelib: it's just too integrated. It's certainly going to be a constellation of packages that have *exact* versions as prerequisites.  Version ranges will not be an option. At that point I don't quite see what the benefit is to split it up in separate packages rather than just have one big package.


Matthias Koeppe

unread,
Sep 26, 2021, 4:53:55 PMSep 26
to sage-devel
On Sunday, September 26, 2021 at 1:10:21 PM UTC-7 Nils Bruin wrote:
It's not an objective -- it's a step that is informed by the technical *restrictions* of modern Python packaging. The *objective* is separate installability of modularized parts of the Sage library as distribution packages ("pip-installable packages").
Let me explain the technical restriction: In modern Python packaging (PEP 517/518), there is no such thing as an optional build dependency: The build of a distribution package is completely determined by the declared build-system dependencies (in pyproject.toml "build-system requires"). There is no mechanism to denote "flavors" of binary wheels of distribution packages based on what "optional C/C++ libraries" were available at build time. This is why one of the first steps of the modularization project was to eliminate the mechanism of "optional extension modules" (https://trac.sagemath.org/ticket/29701, merged in Sage 9.2).
Because of this, the design of the modularized parts necessarily has to be based on the compile-time dependencies on compiled libraries. 
Of course there are classes and functions that depend on several libraries. The goal of 3. is to refactor so that such necessary dependencies do not block the separate installability and usability of modules that do not have these dependencies on multiple libraries. The distribution packages can of course define dependencies ("install-requires"). 
In the ticket list of https://trac.sagemath.org/ticket/29705 I have sketched a few possible distributions (modularized parts). The details will need attention by developers; finding a good design is not a mechanical task, but it will have to take the technical constraints into consideration.

 

Matthias Koeppe

unread,
Sep 26, 2021, 5:00:58 PMSep 26
to sage-devel
On Sunday, September 26, 2021 at 1:53:38 PM UTC-7 Nils Bruin wrote:
On Sunday, 26 September 2021 at 13:32:28 UTC-7 Matthias Koeppe wrote:

This does not describe our current practice or policy. All public names in all Python modules are part of the API; when we move anything around, we use deprecation via lazy_import etc.

I think deprecation only gets used for things exposed in the "sage global namespace", i.e., sage.all. That certainly used to be the case. Other changes are/used to be fair game.

No, that hasn't been true in a long time.

We need to keep sage.all in some form, because it's pretty essential that people can "write sage" (minus preprocessor) by doing a "from sage.all import *". I don't think we'll get an alternative for that  [I agree that libraries should not be doing that], because there needs to be an option for people to get their code working without having to track down a dozen import statements.

Yes, interactive use is a separate issue. Right now I am only concerned about library use.
  
Recent example of "rough" deprecation: from 9.2 to 9.3: complex_number.pxd just disappeared.

Well, we seem to have no deprecation mechanism for Cython headers.
 
 I think these are the kind of things that will limit the modularization of sagelib: it's just too integrated. It's certainly going to be a constellation of packages that have *exact* versions as prerequisites.  Version ranges will not be an option.

Yes, the modularization plan of https://trac.sagemath.org/ticket/29705 will keep a monorepo, and I have no plans to introduce fine-grained version range dependencies between different parts. Through the top-level ./bootstrap script, all versions are kept in sync.
 
At that point I don't quite see what the benefit is to split it up in separate packages rather than just have one big package.

The big package has gigabytes of compiled dependencies, which limits the use of parts of the Sage library in other Python projects. Modularized parts with much lighter dependencies have a chance of getting used (and attract developers) outside of the traditional Sage user community.


William Stein

unread,
Sep 27, 2021, 12:28:18 AMSep 27
to sage-devel
On Sun, Sep 26, 2021 at 1:10 PM Nils Bruin <nbr...@sfu.ca> wrote:
>
> On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:
>>
>> 1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports
>
>
> I'm not sure whether I'm a fan of the "all" packages (for one thing, they make proper use of "lazy_import" a lot harder, but importing lazy_import from all will cause exactly the same problem), but does it need to stand in the way of modularization? Doesn't it just mean that the "all" package needs to do a little magic to discover what its namespace should contain? Here I mean with "magic" something that perhaps needs to discover during runtime what is available within the relevant "namespace" package.
>
> One big advantage of "all" packages is that you can hide the internal module structure of a package with it a little bit, so that the structure becomes an implementation detail, not a part of the API specification. I think that has served us well in the past on some occasions. I'm not sure we want to surrender that option if we don't have to.
>

For what it's worth, we use these "all.py" files to explicitly list
everything to be exported from a directory tree mainly because I
didn't know about __init__.py, or at least how to use it properly.
With use of __init__.py the above would be "from sage.PAC.KAGE import
...". I agree with Nils in that it's not always the case that
minimizing the "scope" of an import is optimal.

>> 2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions
>>
>> Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.
>
>
> Have you considered performance implications of that? It would look to me that abstract base classes require more work to determine membership than a plain MRO lookup. I think this "rule" probably needs a little more cautious formulation. I suspect there may be good reasons to do specific isinstance(...,CLASS) tests. Perhaps this is captured in your "...just to run...". I'm not sure about performance issues compared to "is_CLASS" functions, but clearly banning them should be accompanied by an assessment of why their replacement by "isinstance(...)" is not a (performance) regression.
>
>> 3. Break up Cython modules that depend on several C/C++ libraries simultaneously
>>
>> An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

The flint C library depends on mpfr, so I don't understand this. How
can you have a Cython module that depends on flint but does not depend
on mpfr, given that flint depends on mpfr? Maybe I don't understand
what "depends on" means.

> The particular file you point to looks like it deserves to be restructured, but the way your objective is structured seems definitely flawed. I just wrote a cython module that essentially depends on both the mpfr and the pari libraries (on C-level!). I don't see how that would be broken up and why it would even be bad to link a cython module against several libraries at once.

From a space point of view there isn't any drawback, since your
dynamic module (that Cython builds) just has a pointer to the mpfri
and pari shared object libraries.

> ... I think these are the kind of things that will limit the modularization of sagelib: it's just too integrated.

It might be useful to draw some dependency diagrams for all modules in
the Sage library. I remember seeing
these years ago and they were pretty huge and complicated but also
interesting. There's probably some modules that
depend on many things, but don't themselves get depended on very much
-- e.g., maybe something for visualization?
Does anybody around here do combinatorics? :-)

A chunk of fairly specialized code (e.g., the Pollack-Stevens
overconvergent modular symbols code)
could be moved from "in the sage.all" to "in a separate Python module
that depends on sage.all and is included with
Sage and built as part of Sage, but isn't in the exact same directory
tree as sage/all.py".

A non-Python example of a modular codebase is JupyterLab's own code. Look at

https://github.com/jupyterlab/jupyterlab/tree/master/packages

It's a bunch of packages that all work together to make "JupyterLab".
But by having them as separate packages (but NOT Github repos!),
there are many advantages for development. These really are real
packages, not just subdirectories of code. As an example, here's the
declaration
for the cells package:

https://github.com/jupyterlab/jupyterlab/blob/master/packages/cells/package.json

The npm page for that package is here

https://www.npmjs.com/package/@jupyterlab/cells

where one can download it, see release history, etc.

The modularization of the Sage library could have lofty longterm
goals, but start with one single package that is included
in Sage, but is also a separate Python package and is even published
to pypi and other places. It might be something at
either end of our dependency tree -- something on the top could be
used outside of Sage entirely, and something on the
bottom would only work if you're pip installing it into Sage. An
example is the preparser:

https://github.com/sagemath/sage/blob/8bae3ff7ad670e12cead6860cbd216f819e8d6d2/src/sage/repl/preparse.py

It doesn't really depend on anything. There's also a bunch of game
related code in

https://github.com/sagemath/sage/tree/develop/src/sage/games

that depends on nothing.

Some people will read this and ask "what is the advantage over what we
already have"? One advantage is that we can share our work with
more other people more easily. Somebody can find the sage preparser
or those games modules on pypi and install them
and use them. They could do the same thing with the code in the
Sage library, but it's much harder because all the random
code I mentioned above *does* have some subtle little weird dependency
on the Sage library. By factoring the code out as
separate modules, you are forced to revisit and hopefully remove these
dependencies, which provides value to the world.

There's of course been work like this already with cypari, but that's
kind of different because isn't it basically a different
project that Sage depends on. Maybe there's a massive amount of work
like this already by Matthias and others. I'm very ignorant.

William

Matthias Koeppe

unread,
Sep 27, 2021, 12:41:28 AMSep 27
to sage-devel
On Sunday, September 26, 2021 at 9:28:18 PM UTC-7 wst...@gmail.com wrote:
On Sun, Sep 26, 2021 at 1:10 PM Nils Bruin <nbr...@sfu.ca> wrote:
> On Sunday, 26 September 2021 at 10:44:14 UTC-7 Matthias Koeppe wrote:
>
>> 3. Break up Cython modules that depend on several C/C++ libraries simultaneously
>>
>> An example is sage.matrix.misc, which depends on both flint and mpfr. This is https://trac.sagemath.org/ticket/32433 (which needs help).

The flint C library depends on mpfr, so I don't understand this. 

You are right - this module was a bad example.

 

Matthias Koeppe

unread,
Sep 27, 2021, 12:59:05 AMSep 27
to sage-devel
On Sunday, September 26, 2021 at 9:28:18 PM UTC-7 wst...@gmail.com wrote:
The modularization of the Sage library could have lofty longterm
goals, but start with one single package that is included
in Sage, but is also a separate Python package and is even published
to pypi and other places. 

Matthias Koeppe

unread,
Sep 27, 2021, 1:19:32 AMSep 27
to sage-devel
On Sunday, September 26, 2021 at 9:28:18 PM UTC-7 wst...@gmail.com wrote:
With use of __init__.py the above would be "from sage.PAC.KAGE import
...". 

Side remark: Native namespace packages (PEP 420) do not have __init__.py;
and this is the technical mechanism that the modularization plan uses to recombine disjoint subset distributions.
 

Matthias Koeppe

unread,
Sep 29, 2021, 12:14:40 PMSep 29
to sage-devel
On Sunday, September 26, 2021 at 10:44:14 AM UTC-7 Matthias Koeppe wrote:
1. Replace use of "from sage.PAC.KAGE.all import ..." by more specific imports

This pattern in our Sage library code blocks modularization. 

In https://trac.sagemath.org/ticket/32586 I would need help from experts in pickling to find a solution for a use of "sage.all" in "sage.structure.factory"

 

Matthias Koeppe

unread,
Oct 10, 2021, 3:30:23 PMOct 10
to sage-devel
On Sunday, September 26, 2021 at 10:44:14 AM UTC-7 Matthias Koeppe wrote:

2. Do not import CLASS just to run an isinstance(..., CLASS); likewise, remove uses of is_CLASS functions

Instead, create abstract base classes that can be imported without pulling in the actual implementation classes (https://trac.sagemath.org/ticket/32414). I have done this in https://trac.sagemath.org/ticket/32566 for real/complex floating point fields.

Help is welcome in particular in the following tickets in this direction:
- https://trac.sagemath.org/ticket/32660 -- Add sage.rings.abc.AlgebraicField etc., deprecate is_AlgebraicField
https://trac.sagemath.org/ticket/32641 -- Decentralize sage.rings.numbers_abc
https://trac.sagemath.org/ticket/32664 -- Add sage.rings.abc.FiniteField, deprecate is_FiniteField
https://trac.sagemath.org/ticket/32635 -- sage.matrix.matrix_space: Import element classes on demand, fall back to generic on ImportError (needs review)


Matthias Koeppe

unread,
Oct 23, 2021, 1:27:26 AMOct 23
to sage-devel
Reply all
Reply to author
Forward
0 new messages