Previous posts in the series:
https://groups.google.com/g/sage-devel/c/OeN8o14s6Jc/m/ChnpijP3AgAJ,
https://groups.google.com/g/sage-devel/c/xBzaINHWwUQ/m/Tq17YRqOAAAJ
As we all know, SageMath makes use of hundreds of "upstream" projects: third-party, separately maintained packages written either in Python/Cython or in other languages (C, C++, Common Lisp, Fortran, and the domain-specific languages of systems such as GAP, Singular, Maxima, ...).
The role of SageMath, although it does have a role as a software distribution, is in a clear contrast to that of general software distributions such as Ubuntu or conda-forge: It's probably rare for users to say "I computed this Gröbner basis using Ubuntu Linux" or "a strong generating set for this matrix group was computed using conda-forge". But many users say such things all the time about using SageMath.
Of course this is because of the added value of SageMath over the collection of its dependencies:
1. Abstraction and unification of the interface to multiple upstream dependencies, and integration.
2. The algorithms, structures, and applications implemented in the Sage library itself.
I posit that there is an intrinsic conflict between abstraction/unification/integration and attribution for the upstream projects: Regardless of intent and purpose, a real side effect of abstraction/unification/integration is that the use of the upstream project is obscured to some degree.
It is natural if individuals who contribute to the upstream projects (or have contributed to them in the past) are concerned or unhappy about such effects. And it is understandable if they perceive that the SageMath project is using their work, consuming attention/visibility/attribution, but not "giving back" sufficiently. Contributors are entitled to taking pride in their workpersonship, in the success of the project that they have been contributing to, the brand that they have created, etc. Even if some of us may be wary of possible toxic gradations such as tribalism, it is clear that attention and attribution are important, positive, and legitimate motivating factors for open source contributors in general, and moreover attribution via academic citations may indirectly translate into individuals' careers and success in obtaining funding.
The 2018 sage-devel thread "Suggestion for the SageMath website" (
https://groups.google.com/g/sage-devel/c/H8FcZD90O0Y/m/VRIRzj1sBAAJ) focused on
getting upstream projects credited on our website. Although the suggestions there, to randomly rotate the names of external dependencies that are listed on the main page so they all get equal exposure, or scrolling lists, were not implemented, we have come a long way since then regarding better attribution for upstream projects.
- being able to better know what is in Sage,
- being able to read the original upstreams docs and source code more easily,
- knowing which upstreams devs to contact for *support*, to ask for features, to contribute work, and to thank,
- being able to properly acknowledge what they are using."
Regarding William's first point,
being able to better know what is in Sage: The main page of
http://sagemath.org now links to our reference manual with a list of dependencies that is always up to date because it is automatically generated from the source code. And in the most current version, this long list is broken into categories such as "Mathematics" for better navigability:
https://deploy-livedoc--sagemath.netlify.app/html/en/reference/spkg/For each dependency, we have a page with various information, including a short description, installation instructions and a link to the upstream project.
Giving attribution to the projects that supported a particular computation is a hard problem that cannot be fully automated. I don't know how widely SageMath's profiling-based citation system (
https://doc.sagemath.org/html/en/reference/misc/sage/misc/citation.html) is used by the community; but in any case, it's still a long way from the terse output of this citation system to actual citations that people can use, and it may be valuable to provide some convenient shortcuts.
Next I'll note that the modularization project provides an opportunity to refresh our relations to the upstream projects in very significant ways.
The new pip-installable packages from the modularization project will be a new way for our project to give back something of value to the upstream projects that Sage depends on; and thus are a possible new expression of interest to collaborate with upstream projects: In particular those projects that do not maintain Python interfaces themselves or those that might be interested in higher-level interfaces than what they provide. Examples of packages corresponding to actively maintained upstream projects:
Viewing one of these packages as the Python interface to the upstream library may be much more plausible than considering the monolithic SageMath system as the Python interface to the library. This may facilitate a shared investment in its development and may also avoid duplicate developments. (Disclosure: I have not contacted any upstream projects about this yet because there's little that I can offer before the work that makes the modularized distributions available is merged in to Sage.)
I'll note that these new distributions differ very significantly from the products of earlier, "bottom-up" modularization efforts of the Sage library: packages such as
cypari2 (just discussed in the concurrent thread on modularization,
https://groups.google.com/g/sage-devel/c/mqgtkLr2gXY/m/kSiZktwpAAAJ) and
pplpy (mentioned in the same thread in
https://groups.google.com/g/sage-devel/c/mqgtkLr2gXY/m/65UjwaMaBQAJ, along with some other packages). These packages, designed to be reusable in the Python ecosystem without dependencies on anything in Sage, are not exposed directly to Sage users; they are merely glue between a C/C++ library and the higher-level Sage code that uses it. As such,
these packages do not provide users with a slice of the part of Sage where most of the effort and polish in Sage development is spent, namely the high-level public interface of Sage. (I cannot say whether or to what degree it is related to this observation, but I unfortunately have to say that these modularization efforts have not been a clear success: With the exception of
cypari (which Marc and Nathan created specifically for SnapPy) and the package
cysignals, there is little evidence that such packages have attracted a community of users other than indirectly as dependencies of the Sage library; and certainly no viable community of
developers has formed for these packages; just a few weeks ago I took over as the de-facto maintainer of
cypari2 and
pplpy, you may have seen the announcements.)
Such annotations -- even if as Sage developers we may find them annoying -- give an important secondary benefit, namely specific attribution for the libraries that Sage uses for particular types of computations. This alleviates the conflict between abstraction and attribution that I mentioned above.