Master plan of sage modularization

151 views
Skip to first unread message

Kwankyu Lee

unread,
Nov 17, 2021, 9:27:56 PM11/17/21
to sage-devel
Hi,

Sage modularization is going on, and I see in trac many tickets toward the goal. But I wonder what is the master plan. I mean the plan how sage library  would be split. Here and there I read that some packages in the sage library will be namespace packages, and others won't be. It would be nice to have the master plan at one place. Perhaps this information needs to be included in the developer manual.  If there is a plan laid out, where is it?

Thanks for attention.

Matthias Koeppe

unread,
Nov 17, 2021, 9:47:37 PM11/17/21
to sage-devel
That would be https://trac.sagemath.org/ticket/29705

Help with documenting the principles and goals of modularization in the developer's guide is definitely very welcome!

Kwankyu Lee

unread,
Nov 18, 2021, 1:01:47 AM11/18/21
to sage-devel
On Thursday, November 18, 2021 at 11:47:37 AM UTC+9 Matthias Koeppe wrote:
That would be https://trac.sagemath.org/ticket/29705

Yes, all the information could be found in the ticket description and the descriptions of the subtickets. But the master plan is buried in the details. I want to see how the whole sage library would be split into what distributions at the end of the modularization effort, and how the distributions depend on another if any.   

Help with documenting the principles and goals of modularization in the developer's guide is definitely very welcome!

I think no one can do this but the conductor orchestrating the modularization process.
 

 

Matthias Koeppe

unread,
Nov 18, 2021, 2:09:23 AM11/18/21
to sage-devel
On Wednesday, November 17, 2021 at 10:01:47 PM UTC-8 Kwankyu Lee wrote:
On Thursday, November 18, 2021 at 11:47:37 AM UTC+9 Matthias Koeppe wrote:
That would be https://trac.sagemath.org/ticket/29705

Yes, all the information could be found in the ticket description and the descriptions of the subtickets. But the master plan is buried in the details. I want to see how the whole sage library would be split into what distributions at the end of the modularization effort, and how the distributions depend on another if any.   

The design of the distributions is not completely settled. I am hoping that Interested developers will join the effort in completing the design.

Hard constraints on the distributions come from compile-time dependencies of Cython modules on C/C++ libraries. 
We can define some meaningful small distributions that just consist of a single or a few Cython modules. For example, sagemath-tdlib (https://trac.sagemath.org/ticket/29864) would just package the single Cython module that must be linked with tdlib, sage.graphs.graph_decompositions.tdlib. In the Sage 9.6 cycle, as soon as namespace packages are activated (by dropping __init__.py files), we can start to create these distributions. This is quite a mechanical task -- for each distribution it's just a new directory in pkgs/ with some metadata files.

Let me sketch my strategy to get closer to the design of other distributions. https://trac.sagemath.org/ticket/29865 (waiting for review) introduces the two lowest levels - sagemath-objectssagemath-categories. As soon as we have namespace packages working, the latter will depend on the former. The current issue with both of these distributions is that they are not really separately testable because the doctests for these modules depend on a lot of other functionality from higher-level parts of the library. In contrast, in https://trac.sagemath.org/ticket/32432, we are working on a medium-sized distribution sagemath-polyhedra, which be the first modularized distribution that is useful for end users. It will also be sufficiently self-contained for running most doctests (but some doctests that depend on other parts of the library are marked # optional). Between sagemath-categories and sagemath-polyhedra (which depends on sagemath-categories) there is room for designing intermediate distributions. For example, there could be a distribution that contains the linear algebra needed by sagemath-polyhedra (i.e., parts of the sage.modules and sage.matrix).

At the coarsest level, sagemath-symbolics (https://trac.sagemath.org/ticket/31695) and sagemath-standard-no-symbolics (https://trac.sagemath.org/ticket/32601) are intended to form a partition of all Sage standard library modules that are not already in sagemath-categories (or its dependencies). Below these two distributions, again there is room for designing various intermediate distributions. The design will be best if done by (or in ollaboration with) developers who are knowledgable about specific parts of the Sage library corresponding to the various areas of mathematics.


Kwankyu Lee

unread,
Nov 18, 2021, 2:48:04 AM11/18/21
to sage-devel
Let me sketch my strategy to get closer to the design of other distributions. https://trac.sagemath.org/ticket/29865 (waiting for review) introduces the two lowest levels - sagemath-objectssagemath-categories. As soon as we have namespace packages working, the latter will depend on the former. The current issue with both of these distributions is that they are not really separately testable because the doctests for these modules depend on a lot of other functionality from higher-level parts of the library. In contrast, in https://trac.sagemath.org/ticket/32432, we are working on a medium-sized distribution sagemath-polyhedra, which be the first modularized distribution that is useful for end users. It will also be sufficiently self-contained for running most doctests (but some doctests that depend on other parts of the library are marked # optional). Between sagemath-categories and sagemath-polyhedra (which depends on sagemath-categories) there is room for designing intermediate distributions. For example, there could be a distribution that contains the linear algebra needed by sagemath-polyhedra (i.e., parts of the sage.modules and sage.matrix).

At the coarsest level, sagemath-symbolics (https://trac.sagemath.org/ticket/31695) and sagemath-standard-no-symbolics (https://trac.sagemath.org/ticket/32601) are intended to form a partition of all Sage standard library modules that are not already in sagemath-categories (or its dependencies). Below these two distributions, again there is room for designing various intermediate distributions. The design will be best if done by (or in ollaboration with) developers who are knowledgable about specific parts of the Sage library corresponding to the various areas of mathematics.

Thank you. This would be a start. 

For example, if there would be a distribution sagemath-coding that contains sage/coding, then would we have this hierarchy 

sagemath-objects < sagemath-categories < sagemath-standard-no-symbolics < ... < sagemath-coding
 
where ... might be filled with other intermediate distributions like, I imagine, sagemath-rings, sagemath-schemes? In general, would this hierarchy have a tree structure?

Matthias Koeppe

unread,
Nov 18, 2021, 12:59:55 PM11/18/21
to sage-devel
On Wednesday, November 17, 2021 at 11:48:04 PM UTC-8 Kwankyu Lee wrote:
For example, if there would be a distribution sagemath-coding that contains sage/coding, then would we have this hierarchy 

sagemath-objects < sagemath-categories < sagemath-standard-no-symbolics < ... < sagemath-coding
 
where ... might be filled with other intermediate distributions like, I imagine, sagemath-rings, sagemath-schemes?

OK, let's take a look at sage.coding. 

First, let's see if it uses symbolics. 

(9.5.beta6) $ git grep -E 'sage[.](symbolic|functions|calculus)' src/sage/coding
src/sage/coding/code_bounds.py:        from sage.functions.other import ceil
src/sage/coding/code_bounds.py:        sage: codes.bounds.entropy(1/5,4).factor()    # optional - sage.symbolic
src/sage/coding/code_bounds.py:        sage: codes.bounds.entropy(1, 3)              # optional - sage.symbolic
src/sage/coding/grs_code.py:from sage.functions.other import binomial
src/sage/coding/grs_code.py:from sage.symbolic.ring import SR
src/sage/coding/guruswami_sudan/gs_decoder.py:from sage.functions.other import floor
src/sage/coding/guruswami_sudan/utils.py:from sage.functions.other import floor

Apparently it does not in a very substantial way 
- The imports of ceil and floor can likely be replaced by integer_floor, integer_ceil from sage.arith.misc.
- Looking at the import of SR by src/sage/coding/grs_code.py, it seems that SR is used for running some symbolic sum, but the doctests do not show symbolic results, so it is likely that this can be replaced. 
- Note though that the above textual search for the module names is merely a heuristic. Looking at the source of "entropy", through "log" from sage.misc.functional, a runtime dependency on symbolics comes in. (I have already marked 2 doctests there as # optional - sage.symbolic.)

So if packaged as sagemath-coding, now a domain expert would have to decide whether these dependencies on symbolics are strong enough to declare a runtime dependency ("install_requires") on sagemath-symbolics. This declaration would mean that anyone who installs sagemath-coding ("pip install sagemath-coding") would pull in sagemath-symbolics ... which, at least currently, has heavy compile-time dependencies (ECL/Maxima/FLINT/Singular...).
The alternative is to consider the use of symbolics by sagemath-coding merely as something that provides some extra features ... which will only be working if the user also has installed sagemath-symbolics. (It is possible to declare this as an "extras_require" so that users could go "pip install sagemath-coding[symbolics]".) 

Let's say that we go with the second alternative. Then sagemath-coding would be a dependency of sagemath-standard-no-symbolics.
So it would look like this:

sagemath-objects < sagemath-categories < sagemath-coding < sagemath-standard-no-symbolics < sagemath-standard

In general, would this hierarchy have a tree structure?

No, only directed acyclic. For example, another chain is

sagemath-objects < sagemath-categories < sagemath-symbolics < sagemath-standard


As to other distributions that you asked about, yes, sagemath-schemes could be a good distribution, but I am not sure where it would go in this picture because sage.schemes is not homogeneous in terms of its dependencies. For example, sage.schemes.[hyper]elliptic_curves seems to make heavy use of symbolics, whereas sage.schemes.toric is closely connected to sage.geometry. So smaller distributions along the lines of dependencies could make sense as well.

We will most likely not have a distribution sagemath-rings. The reason is that the various ring element implementations pull in a wide spectrum of compiled libraries, so sagemath-rings would be nearly as monolithic as all of Sage. More useful could be smaller distributions, again informed by dependencies, such as sagemath-padics, for example: Most current uses of the NTL library in Sage seem to be for sage.rings.padics. 


 

Kwankyu Lee

unread,
Nov 18, 2021, 8:37:13 PM11/18/21
to sage-devel
On Friday, November 19, 2021 at 2:59:55 AM UTC+9 Matthias Koeppe wrote:
... So it would look like this:

sagemath-objects < sagemath-categories < sagemath-coding < sagemath-standard-no-symbolics < sagemath-standard


package sage.coding needs linear algebra, all kinds of fields, and algebraic curves, etc.  Hence I thought sagemath-standard-no-symbolics (or whatever that includes all packages that sage.coding depends on) should precede sagemath-coding in the diagram. No? 


Matthias Koeppe

unread,
Nov 18, 2021, 8:51:36 PM11/18/21
to sage-devel
On Thursday, November 18, 2021 at 5:37:13 PM UTC-8 Kwankyu Lee wrote:
On Friday, November 19, 2021 at 2:59:55 AM UTC+9 Matthias Koeppe wrote:
... So it would look like this:

sagemath-objects < sagemath-categories < sagemath-coding < sagemath-standard-no-symbolics < sagemath-standard


package sage.coding needs linear algebra, all kinds of fields, and algebraic curves, etc.

Yes, these would have to be provided by dependencies of sagemath-coding, i.e., something between sagemath-categories and sagemath-coding. The names and design of these distributions are not settled yet. For example, below sagemath-coding and sagemath-polyhedra (but above sagemath-categories) there should be a distribution that provides linear algebra over basic rings and fields. This could be a subset of what (at the moment) goes into sagemath-polyhedra as developed in https://trac.sagemath.org/ticket/32432
 
  Hence I thought sagemath-standard-no-symbolics (or whatever that includes all packages that sage.coding depends on) should precede sagemath-coding in the diagram. 

Installing sagemath-standard-no-symbolics is intended to provide "everything that is right now in the Sage library without depending on optional packages, but without sage.symbolic etc." This would include (as a dependency) sagemath-coding.


Kwankyu Lee

unread,
Nov 18, 2021, 11:05:26 PM11/18/21
to sage-devel
Questions on "Features":

We are introducing Features for packages in the sage library, like sage__combinat, sage__graphs, sage__plot, etc. How is this related with distributions? If a package is included in a distribution (other than sagemath-standard), a Feature for the package is introduced?

Can(Do) we use doctest tags for optional packages like, for example, # optional - sage.plot only when a Feature for the package (sage.plot) exists?

Thus whether we use such tags for certain doctests ultimately depend on how we organize the distributions? 

Perhaps the organization of these distributions would constantly evolve.  And hence we would add such tags more and more? 

Matthias Koeppe

unread,
Nov 18, 2021, 11:20:23 PM11/18/21
to sage-devel
On Thursday, November 18, 2021 at 8:05:26 PM UTC-8 Kwankyu Lee wrote:
Questions on "Features":

We are introducing Features for packages in the sage library, like sage__combinat, sage__graphs, sage__plot, etc. How is this related with distributions? If a package is included in a distribution (other than sagemath-standard), a Feature for the package is introduced?

Can(Do) we use doctest tags for optional packages like, for example, # optional - sage.plot only when a Feature for the package (sage.plot) exists?

Yes, that's the mechanism. A feature's name is used as the doctest tag. 

These features are explicitly declared in a single file, src/sage/features/sagemath.py, and we would add the mapping from features to the distributions providing them (actually, to SPKG names) there as well. Using this mapping, we can then start to give installation hints to the user.

Thus whether we use such tags for certain doctests ultimately depend on how we organize the distributions? 

Yes, but hopefully it will be relatively stable because of the decision to key the doctest tags to package names rather than distribution names.
 
Perhaps the organization of these distributions would constantly evolve.  And hence we would add such tags more and more? 

Yes, this will likely take a while to stabilize. 

Kwankyu Lee

unread,
Nov 18, 2021, 11:29:54 PM11/18/21
to sage-devel
Thank you for answers, and starting the ticket


to add a section on modularization in the sage developer manual.

Reply all
Reply to author
Forward
0 new messages