Database of number fields

Julien Puydt

unread,

Feb 22, 2020, 11:13:17 AM2/22/20

to sage-...@googlegroups.com

Hi,

I wanted to package sagemath's optional package
"database_jones_numfield" for Debian, but it appears to be merely a
binary object.

For Debian, I need sources : where is the generation script I can use
to re-generate the .sobj?

Thanks,

JP

Nils Bruin

unread,

Feb 22, 2020, 12:24:15 PM2/22/20

to sage-devel

On Saturday, February 22, 2020 at 8:13:17 AM UTC-8, Snark wrote:

Hi,

I wanted to package sagemath's optional package
"database_jones_numfield" for Debian, but it appears to be merely a
binary object.

The article describing (a version of?) the database is here:

Jones, J., & Roberts, D. (2014). A database of number fields. LMS Journal of Computation and Mathematics, 17(1), 595-618. doi:10.1112/S1461157014000424

That's not a source in the software engineering sense (and in fact tells very little about how the database was compiled anyway)

One way to get a source would be let an SQL query write a text version of the database to a file. Your "compiling step" would then consist of going in the other direction again.

Depending on how big the text representation ends up being, this might even be valuable in general, because it would be a form that a little less technology-dependent in its representation (given that sqlite etc. aren't that great at compressing mathematical data, I suspect it won't be much worse than the binary representation)

For more extensive verification: checking that the entries in the database are correct would be straightforward -- the invariants listed there are not super-complicated to recompute and check. The special value of the database is in its completeness claims in various aspects. That's not easy to verify.

Nils Bruin

unread,

Feb 22, 2020, 12:33:29 PM2/22/20

to sage-devel

Sorry, code to produce the sobj is already included in the package. From jones.py:

    def _init(self, path):
        """
        Create the database from scratch from the PARI files on John Jones's
        web page, downloaded (e.g., via wget) to a local directory, which
        is specified as path above.

        INPUT:

        - ``path`` - (default works on William Stein install.)
           path must be the path to Jones's Number_Fields directory
           http://hobbes.la.asu.edu/Number_Fields These files should have
           been downloaded using wget.

        EXAMPLES: This is how to create the database from scratch, assuming
        that the number fields are in the default directory above: From a
        cold start of Sage::

                sage: J = JonesDatabase()
                sage: J._init()   # not tested
                ...

        This takes about 5 seconds.
        """
        from sage.misc.misc import sage_makedirs
        x = PolynomialRing(RationalField(), 'x').gen()
        self.root = {}
        self.root[tuple([])] = [x - 1]
        if not os.path.exists(path):
            raise IOError("Path %s does not exist." % path)
        for X in os.listdir(path):
            if X[-4:] == "solo":
                Z = path + "/" + X
                print(X)
                for Y in os.listdir(Z):
                    if Y[-3:] == ".gp":
                        self._load(Z, Y)
        sage_makedirs(JONESDATA)
        save(self.root, JONESDATA + "/jones.sobj")

Modulo collecting the text source that aren't included (but readily available on a web site), the code to compile the database is already there! You might want to check with the author if he/she is OK with you downloading and distributing the text files separately, but for robustness I think you'd be doing the community a favour by wrapping the data in a "source distribution" too.

Julien Puydt

unread,

Feb 22, 2020, 3:54:47 PM2/22/20

to sage-...@googlegroups.com

Hi,

If I understand well, there's a series of build steps :
pari code -> data files -> sobj
and the last step is jones.py.

But then what I want is the pari code, as that's the source.

(Unless computing the data files is incredibly long, in which case I'll
ship both the pari code and the data files, with a nice notice to
explain that I ship pre-build for a good reason.)

It's not clear which data files have been used to generate the current
sagemath package : https://hobbes.la.asu.edu/Number_Fields/ points to
several pages (including a 404), and on each of these pages, there's a
link to .gp files which must be the ones getting used, if I understand
well.

And how is versioning managed?

JP

Nils Bruin

unread,

Feb 22, 2020, 9:36:42 PM2/22/20

to sage-devel

On Saturday, February 22, 2020 at 12:54:47 PM UTC-8, Snark wrote:

If I understand well, there's a series of build steps :
pari code -> data files -> sobj
and the last step is jones.py.

As far as you, the software packager, are concerned, the only relevant step just data files -> sobj. "jones.py" actually gives you the means of doing that conversion. It's indeed a little vague from which data files the current "sobj" was constructed. That probably means you get to pick!.

Constructing the original data files earned the authors a scientific publication. Replicability is nice, but I don't think it's required to be explicitly performed every time software is packaged/installed (and indeed, it may have cost significant CPU cycles, even if the gp program used for it is tiny).

I don't think there's going to be any versioning. Once you've tabulated all the quintic number fields unramified outside {2,3,5,7}, there's very little that's going change anymore about it (although people could discover errors in your tabulation ...).

Julien Puydt

unread,

Feb 23, 2020, 2:11:02 AM2/23/20

to sage-...@googlegroups.com

Le samedi 22 février 2020 à 18:36 -0800, Nils Bruin a écrit :
> On Saturday, February 22, 2020 at 12:54:47 PM UTC-8, Snark wrote:
> > If I understand well, there's a series of build steps :
> > pari code -> data files -> sobj
> > and the last step is jones.py.
> >
>
> As far as you, the software packager, are concerned, the only
> relevant step just data files -> sobj. "jones.py" actually gives you
> the means of doing that conversion. It's indeed a little vague from
> which data files the current "sobj" was constructed. That probably
> means you get to pick!.

Not really : as the software packager in Debian, I'm supposed to use
sources, so I am supposed to start from the initial computing scripts,
turn them into data files and then into sobj.

Being vague about what the sources are when the package is under GPL is
a concept I'm a bit uncomfortable with.

> Constructing the original data files earned the authors a scientific
> publication. Replicability is nice, but I don't think it's required
> to be explicitly performed every time software is packaged/installed
> (and indeed, it may have cost significant CPU cycles, even if the gp
> program used for it is tiny).

If the initial computing scripts take very long, then as an exception,
I'll ship the data files and build from them. But I'll still ship the
initial computing scripts so anyone with enough computing power&time
can replicate! [The Debian build hosts stop compilations after 150min
of log-inactivity, for example.]

The size of gp doesn't matter : my package would just build-depend, but
not depend on it.

> I don't think there's going to be any versioning. Once you've
> tabulated all the quintic number fields unramified outside {2,3,5,7},
> there's very little that's going change anymore about it (although
> people could discover errors in your tabulation ...).

Well, I'll version on the day I got the data files then. Or the
computing scripts.

JP

Reply all

Reply to author

Forward