Reproducability of results

Clemens Heuberger

ungelesen,

23.02.2016, 09:50:2923.02.16

an sage-...@googlegroups.com

I am currently revising a paper that I submitted in March 2015. Parts of the
results heavily rely on computations in Sage; at that time, Sage 6.5.

It turns out that the old code no longer works with Sage 7.0. It is also
impossible to compile Sage 6.5 nowadays because the infrastructure changed (it
tries to download
http://www.sagemath.org/packages/upstream/patch/patch-2.7.1.tar.gz which no
longer exists). For completeness I mention that none of my old code triggered a
deprecation warning and that it worked 11 months ago.

Bottom line: I cannot reproduce 11 month old results anymore. Adapting the code
is a frustrating and lengthy work. And it does not solve the problem, because
new releases of Sage will again introduce changes.

At https://groups.google.com/d/msg/sage-devel/tDKb8QF97VY/tc8jFWDwQ2YJ, there
was some discussion on tests related to papers and books and the src/sage/tests
directory.

Of course, I could have opened a ticket and included my code into the
src/sage/tests directory. But I cannot imagine that this approach would scale
very well. I assume that there are many papers out there with a substantial
amount of CPU time required for reproducability.

Any thoughts/suggestions?

Regards,

Clemens

> -------- Original-Nachricht --------
> Betreff: [sage-devel] tests related to papers and books
> Datum: Thu, 8 May 2014 15:38:10 -0700
> Von: William Stein <wst...@gmail.com>
> Antwort an: sage-...@googlegroups.com
> An: sage-devel <sage-...@googlegroups.com>
>
> Hi,
>
> I'm at a reproducible research workshop, and that reminded me that we
> have a directory in Sage full of tests related to published works:
>
> https://github.com/sagemath/sage/tree/master/src/sage/tests
>
> But it's really not much at all! So if you're going to write (or
> wrote) a paper with Sage examples, try to remember to add a file to
> the above directory, so your examples work forever, etc.
>
> William
>
>

William Stein

ungelesen,

23.02.2016, 09:54:5923.02.16

an sage-devel

On Tue, Feb 23, 2016 at 6:50 AM, Clemens Heuberger
<clemens....@aau.at> wrote:
> I am currently revising a paper that I submitted in March 2015. Parts of the
> results heavily rely on computations in Sage; at that time, Sage 6.5.
>
> It turns out that the old code no longer works with Sage 7.0. It is also
> impossible to compile Sage 6.5 nowadays because the infrastructure changed (it
> tries to download
> http://www.sagemath.org/packages/upstream/patch/patch-2.7.1.tar.gz which no
> longer exists). For completeness I mention that none of my old code triggered a
> deprecation warning and that it worked 11 months ago.

Is this really the case? If we no longer provide tarballs that can be
compiled without internet access, then this is a *massive* regression.

> Bottom line: I cannot reproduce 11 month old results anymore. Adapting the code
> is a frustrating and lengthy work.

No matter what -- even if you were going to update your code -- you
would absolutely want to have access to a sage-6.5 build, so you could
test before and after.

> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

--
William (http://wstein.org)

Jeroen Demeyer

ungelesen,

23.02.2016, 09:57:2123.02.16

an sage-...@googlegroups.com

On 2016-02-23 15:50, Clemens Heuberger wrote:
> It is also impossible to compile Sage 6.5 nowadays

I assume this was a build from git? That's indeed not supported. A real
build-from-source-tarball should still work.

> Bottom line: I cannot reproduce 11 month old results anymore.

Personally, I think this is a flaw in the review process for Sage
tickets. Reviewers should check whether existing code might break with a
patch, but that's not always done.

> Of course, I could have opened a ticket and included my code into the
> src/sage/tests directory. But I cannot imagine that this approach would scale
> very well. I assume that there are many papers out there with a substantial
> amount of CPU time required for reproducability.

What if you just add a simple case? Imagine your code computes f(n) for
all n up to 10^6. Then you could use f(10) as test case.

Volker Braun

ungelesen,

23.02.2016, 10:09:1623.02.16

an sage-devel, clemens....@aau.at

Source tarball is online, go to http://files.sagemath.org and click on "...source code of Sage (older versions)"

Clemens Heuberger

ungelesen,

23.02.2016, 10:29:2523.02.16

an sage-...@googlegroups.com

On 2016-02-23 16:57, Jeroen Demeyer wrote:
> On 2016-02-23 15:50, Clemens Heuberger wrote:
>> It is also impossible to compile Sage 6.5 nowadays
>
> I assume this was a build from git? That's indeed not supported. A real
> build-from-source-tarball should still work.

yes, I tried to build from git. I found the tarball in the meantime and am
currently compiling it.

How long do we support old tarballs?

>
>> Bottom line: I cannot reproduce 11 month old results anymore.
>
> Personally, I think this is a flaw in the review process for Sage tickets.
> Reviewers should check whether existing code might break with a patch, but
> that's not always done.

As an illustration, one of the problems I encountered:

cheuberg@rk01-math:/local/sage$ sage-6.7/sage -c "if x != infinity: print 'yes'"
yes
cheuberg@rk01-math:/local/sage$ sage-6.8/sage -c "if x != infinity: print 'yes'"

Relying on the old behaviour was probably a bug, but behaviour changed anyhow
without a deprecation.

I try to pinpoint the other examples and will then report those.

>
>> Of course, I could have opened a ticket and included my code into the
>> src/sage/tests directory. But I cannot imagine that this approach would scale
>> very well. I assume that there are many papers out there with a substantial
>> amount of CPU time required for reproducability.
>
> What if you just add a simple case? Imagine your code computes f(n) for all n up
> to 10^6. Then you could use f(10) as test case.

Essentially, we computed one generating function for one very specific problem,
using concatenations of transducers to get it.

We could have included the doctests and marking some of them as optional or long
(do we have "very long"? we are regularly running ptestlong, and you don't want
to include all of that stuff...), and the issue above would have been covered.
But I am not yet sure about the others.

William Stein

ungelesen,

23.02.2016, 10:39:2023.02.16

an sage-devel

On Tue, Feb 23, 2016 at 7:29 AM, Clemens Heuberger
<clemens....@aau.at> wrote:
> On 2016-02-23 16:57, Jeroen Demeyer wrote:
>> On 2016-02-23 15:50, Clemens Heuberger wrote:
>>> It is also impossible to compile Sage 6.5 nowadays
>>
>> I assume this was a build from git? That's indeed not supported. A real
>> build-from-source-tarball should still work.
>
> yes, I tried to build from git. I found the tarball in the meantime and am
> currently compiling it.
>
> How long do we support old tarballs?

Forever. The first-ever release of Sage is here:
http://old.files.sagemath.org/src-old/ However, "support" only
means that if you setup a VM or computer with a supported OS from then
(e.g., some old debian or redhat) then you can build using *that* OS.
We definitely don't update old tarballs to keep running on new
versions of Linux; that would be impossibly hard, and get harder every
time we do a new release.

> As an illustration, one of the problems I encountered:
>
> cheuberg@rk01-math:/local/sage$ sage-6.7/sage -c "if x != infinity: print 'yes'"
> yes
> cheuberg@rk01-math:/local/sage$ sage-6.8/sage -c "if x != infinity: print 'yes'"
>
> Relying on the old behaviour was probably a bug, but behaviour changed anyhow
> without a deprecation.

It seems to me that

sage: x = var('x')
sage: bool(x!=infinity)
False

*is* a newly introduced bug. I can't understand how the above
behavior could be justified, especially given that

sage: bool(x != 2)
True

People do introduce bugs into Sage...

-- William

>
> I try to pinpoint the other examples and will then report those.
>
>>
>>> Of course, I could have opened a ticket and included my code into the
>>> src/sage/tests directory. But I cannot imagine that this approach would scale
>>> very well. I assume that there are many papers out there with a substantial
>>> amount of CPU time required for reproducability.
>>
>> What if you just add a simple case? Imagine your code computes f(n) for all n up
>> to 10^6. Then you could use f(10) as test case.
>
> Essentially, we computed one generating function for one very specific problem,
> using concatenations of transducers to get it.
>
> We could have included the doctests and marking some of them as optional or long
> (do we have "very long"? we are regularly running ptestlong, and you don't want
> to include all of that stuff...), and the issue above would have been covered.
> But I am not yet sure about the others.
>

Martin Vahi

ungelesen,

23.02.2016, 11:12:4823.02.16

an sage-devel, clemens....@aau.at

Given that the virtual appliance file formats seem to be somewhat standardized so that different virtual machine running software can export-import virtual appliances, the most hassle free solution for writing scientific papers that use Sage might be the

http://files.sagemath.org/win/index.html

I would also like to point out that although as of 2016 the newest persistent memory type is the FRAM

http://archive.softf1.com/2015/electronics/2015_12_xx_downloaded_FRAM_Description_by_Texas_Instruments.pdf

most computers in the past used Flash, which means that at some point the older computers that originate from the era, where those old Linux distributions were in use, stop working due to the bitrot of the Flash memory at their various components, BIOS, microcontrollers, etc. Newest hardware might not, probably does not, have the backwards compatibility to the old hardware. A stellar illustration of that is that the software of the various old computers can be run only in "computer simulators" like

http://www.worldofspectrum.org/
http://www.mess.org/
http://www.emulator-zone.com/
http://www.hercules-390.eu/
http://simh.trailing-edge.com/
http://www.villehelin.com/wzonka-lad.html
http://gxemul.sourceforge.net/

There's even one for the x86 PC:
http://jpc.sourceforge.net/home_home.html

(I could copy-paste more from my archive page at
http://archive.softf1.com/lounge/links_2_other_archives/
)

That is to say, at some point using a virtual machine IS YOUR ONLY OPTION,
because the Flash memory of the old hardware bitrots away or the old hardware
fails by other means and the new hardware does not support
all the legacy mess.

Thank You for reading my comment. :-)

kcrisman

ungelesen,

23.02.2016, 12:16:4023.02.16

an sage-devel

This smells like http://trac.sagemath.org/ticket/18877 ("better nonnumeric comparisons with infinity", in particular).

Eric Gourgoulhon

ungelesen,

23.02.2016, 12:37:1323.02.16

an sage-devel

Hi,

Le mardi 23 février 2016 16:39:20 UTC+1, William a écrit :

It seems to me that

sage: x = var('x')
sage: bool(x!=infinity)
False

*is* a newly introduced bug. I can't understand how the above
behavior could be justified

In the same vein, note that

sage: x = SR.var('x', domain='real')
sage: bool(x != 0)
False

while at the same time:

sage: bool(x == 0)
False

so that x!=0 differs from not x==0...
cf. https://groups.google.com/d/msg/sage-devel/2ppi74WPUUA/dKBzA5vECQAJ
As I understand it, this non-binary logic is introduced by returning False instead of 'Cannot decide'.

Best wishes,

Eric.

William Stein

ungelesen,

23.02.2016, 12:38:3723.02.16

an sage-devel

Yes, that I understand.

But in this case it seems like deciding is pretty easy to do?

>
> Best wishes,
>
> Eric.

Clemens Heuberger

ungelesen,

23.02.2016, 21:48:4323.02.16

an sage-...@googlegroups.com

On 2016-02-23 17:29, Clemens Heuberger wrote:
> I try to pinpoint the other examples and will then report those.

Here is the next issue which I encountered:

$ sage-6.9/sage -c "print bool((x^2 - 1 - (x+1)*(x-1)) != 0)"
False
$ sage-6.10/sage -c "print bool((x^2 - 1 - (x+1)*(x-1)) != 0)"
True

I should have used .is_zero a year ago ....

Ralf Stephan

ungelesen,

23.02.2016, 23:24:0723.02.16

an sage-devel

Thanks for the bug reports. I don't think I will have time to work on these this week, so maybe someone else?

Regards,

Clemens Heuberger

ungelesen,

24.02.2016, 09:01:3824.02.16

an sage-...@googlegroups.com

and here is the third (and final) issue which did cost me hours to find
(comparison != was hidden somewhere in my code); but it seems to be related to
the second problem above.

$ cat test.sage
a = -1/(x+1) + 1/x
b = 1/(x*(x+1))
print bool(a == b), bool(a != b)
$ sage-6.9 test.sage
True False
$ sage-6.10 test.sage
True True
$ sage-7.0 test.sage
True True

In our code, we basically used

if a != b:
....

I cannot explain this error by a trivalent logic which uses "False" for "unknown".

Clemens Heuberger

ungelesen,

24.02.2016, 09:34:0224.02.16

an sage-...@googlegroups.com

On 2016-02-23 16:57, Jeroen Demeyer wrote:

My impression after one day of debugging is that while we may have a very good
coverage of tests covering specialised methods, we have very few tests which
test behaviour of Sage as it is actually used.

If I counted correctly, there are tests covering 6 publications. Some of those
are rather isolated _examples_ for how to use sage.

There are also some tests in src/sage/tests which do not correspond to
publications at all.

So it seems that there are almost no tests which actually do solve some
mathematical problem of any sort.

This all boils down to the following question:

What is the policy concerning src/sage/tests?

Some subquestions coming to my mind:
* Shall tests related to publications be moved to another directory
(sage/tests/publications)?
* What files are welcome there?
* How strict are coding standards? Do we enforce PEP8 compliance?
* Shall there be a mechanism that the tests are not included in make ptestlong,
but are tested less regularly?

I cannot believe that our code was the only code which was broken over the last
11 months by 2 to 3 (depends on how you count my second and third problems)
different changes of behaviour without deprecation period.

Jeroen Demeyer

ungelesen,

24.02.2016, 11:16:5324.02.16

an sage-...@googlegroups.com

On 2016-02-24 15:33, Clemens Heuberger wrote:
> Some subquestions coming to my mind:
> * Shall tests related to publications be moved to another directory
> (sage/tests/publications)?

Fine.

> * What files are welcome there?

Anything which satisfies the rules on doctests (in particular, tests
should not take too much time).

> * How strict are coding standards? Do we enforce PEP8 compliance?

No coding standard.

> * Shall there be a mechanism that the tests are not included in make ptestlong

They should be included in "make ptestlong" otherwise they won't be tested.

I have a more important question:
* what do we do when one of these tests breaks and it's not because of a
bug being introduced (e.g. numerical noise, undefined behaviour which
changed, deprecation of something)

The "french book" authors gave up on caring about failures, so the tests
which are currently in Sage no longer correspond to the printed book.

Jeroen.

Clemens Heuberger

ungelesen,

24.02.2016, 11:58:5324.02.16

an sage-...@googlegroups.com

On 2016-02-24 18:16, Jeroen Demeyer wrote:
>> * What files are welcome there?
> Anything which satisfies the rules on doctests (in particular, tests should not
> take too much time).

What is the definition of "too much time"?

Instead of the usual time per test (which is rather pointless here), I'd prefer
a limit per file.

And then at some point, when we have 60 instead of 6 files, say, we might think
about total time.

> I have a more important question:
> * what do we do when one of these tests breaks and it's not because of a bug
> being introduced (e.g. numerical noise, undefined behaviour which changed,
> deprecation of something)

fix and notify authors (should be in header of file). I see no practical
alternative, we won't want to delay an arb update, say, because of numerical noise.

But I think that breaking a test in that directory is a hint that a deprecation
notice should be considered ...

Clemens

Clemens Heuberger

ungelesen,

26.02.2016, 01:21:4026.02.16

an sage-...@googlegroups.com

On 2016-02-23 17:29, Clemens Heuberger wrote:

> I try to pinpoint the other examples and will then report those.

I finally had a fourth issue in this saga. If you used jupyter in Sage 6.9 on
your machine and the directory containing Sage 6.9 no longer exists, opening a
jupyter file saved by Sage 6.9 in Sage 7.0 leads to a "kernel error" which is
not recoverable by simply switching to an existing kernel. (you have to change
kernel, save, close, reopen the worksheet).

I opened
http://trac.sagemath.org/ticket/20121
for this and hope that somebody with more knowledge about the sage/juypter
interaction can do something about that.

Thanks,

Clemens

Allen antworten

Antwort an Autor

Weiterleiten