more tests in sage (not doctests)

72 views
Skip to first unread message

stephan...@gmail.com

unread,
Jun 22, 2016, 4:35:23 AM6/22/16
to sage-...@googlegroups.com
Hi everyone,

I would like to discuss adding at least one more testing method to the sage development process than just doctests (e.g. nose).

Doctests are certainly great but they have obvious limitations.

Some of them are:
1) You want to keep the source code clean so doctests should be “short”. But some test cases require more complicated code or have long output which you would not like to add to the source code.
2) You don’t want to have certain tests in the documentation of a function which would just distract the user.
3) Some things can’t or cannot easily be tested in doctests.

Would like to see for instance are:
a) performance tests where we would test against the previous release to make sure that changes being introduced do not affect the performance in a negative way
b) randomized test, example: check for a number of randomly generated number fields that arithmetic operations with randomly generated number field elements gives the correct results. Randomized tests help to identify issues that occur with input that no one thought about testing.
c) test mathematical correctness more extensively by storing results of a larger set of results that have been verified to be mathematically correct in some way
(These tests in particular could run very long of we want to cover large ranges and we should not make these tests run by everyone but rather by some bots in parallel and maybe also not block new releases if untested but they could run continuously and block a new release if a problem is discovered. The data to check against would be publicly available and we could advertise that people install a bot on their machine that runs at scheduled times and just picks some examples that have not been checked with the current development version (or not on their particular architecture or OS version or so.))
d) test unpickling of objects which seems to break rather often and is not covered at all by any of the doctests

Maybe not all of these tests would have to be run every time someone submits a patch but should be run before a release comes out.

I know that there are many things to discuss and other people have way way more experience with sage development than I have but I feel very strongly about this and really think that this should be a top priority to ensure that sage works reliably.

What do you think?

Stephan

Samuel Lelievre

unread,
Jun 22, 2016, 4:40:06 AM6/22/16
to sage-devel
+1

Jeroen Demeyer

unread,
Jun 22, 2016, 4:57:08 AM6/22/16
to sage-...@googlegroups.com
On 2016-06-21 23:06, stephan...@gmail.com wrote:
> 1) You want to keep the source code clean so doctests should be “short”. But some test cases require more complicated code or have long output which you would not like to add to the source code.

For long or special doctests, you can put the tests in a separate module
containing only tests. We have some in src/sage/tests

> 2) You don’t want to have certain tests in the documentation of a function which would just distract the user.

For this, you can use TESTS: blocks which do not appear in the
documentation.

> 3) Some things can’t or cannot easily be tested in doctests.
>
> Would like to see for instance are:
> a) performance tests where we would test against the previous release to make sure that changes being introduced do not affect the performance in a negative way

That's really difficult to get right. But I'd love to hear good
suggestions. Keep in mind that timing an operation is the easy part, the
hard part is what to do with the timing results.

There is an old ticket about this, but it never got anywhere.
See https://trac.sagemath.org/ticket/12720

> b) randomized test, example: check for a number of randomly generated number fields that arithmetic operations with randomly generated number field elements gives the correct results. Randomized tests help to identify issues that occur with input that no one thought about testing.

This can be done with doctests.

> c) test mathematical correctness more extensively by storing results of a larger set of results that have been verified to be mathematically correct in some way

This can be done with doctests.

> (These tests in particular could run very long of we want to cover large ranges and we should not make these tests run by everyone but rather by some bots in parallel and maybe also not block new releases if untested but they could run continuously and block a new release if a problem is discovered. The data to check against would be publicly available and we could advertise that people install a bot on their machine that runs at scheduled times and just picks some examples that have not been checked with the current development version (or not on their particular architecture or OS version or so.))

This sounds like overkill. It would introduce "yet another" testing
mechanism besides the patchbot and the buildbot that we have to maintain.

> d) test unpickling of objects which seems to break rather often and is not covered at all by any of the doctests

This can be done with doctests (possibly using the pickle jar).

> Maybe not all of these tests would have to be run every time someone submits a patch but should be run before a release comes out.

I agree with this. We could add such tests if the release manager
agrees. However, this can be also be done with doctests (say, using an #
optional - release tag)

> What do you think?

I think that doctests are really just an interface. You can easily run
all kinds of tests using the doctester and it's nice to have a
consistent interface for testing. I think we should only introduce a new
mechanism if there is a clear need.


Jeroen.

Johan S. H. Rosenkilde

unread,
Jun 22, 2016, 5:17:10 AM6/22/16
to sage-...@googlegroups.com
Big +1.

There's the sage/tests folder which seems to be a place where certain
developers who really couldn't help themselves put some additional
tests. But it doesn't go near as far as what you're proposing.

However, there's obvious issues wrt. ensuring that such tests gets
written once in a while, without forcing further bureaucracy on the
development process.

Currently, the Reviewer's Checklist specifies that you should manually
try out the code on various examples. Perhaps the reviewer could be in
charge of writing the first germs of such non-doctest testing of the
ticket. It wouldn't have to be fancy, just sort of saving for the future
the work that the reviewer is doing anyway. Whenever a bug shows up in a
module, that's an obvious time to improve the non-doctest testing, when
pinning down the bug and testing that it was fixed.

> a) performance tests where we would test against the previous release to make sure that changes being introduced do not affect the performance in a negative way

Vincent Delecroix proposed something like this for GSoC 2016, but no
students came forward:
https://wiki.sagemath.org/GSoC/2016

> b) randomized test

There's the hypothesis library which is a Python-take on Haskell's
brilliant QuickCheck. IMO, this kind of declarative property testing on
random instances is extremely well suited for testing mathematical
software, but there's some challenges in providing a good set of "random
generators" for the many types of objects and algorithms (e.g. testing
topological sort can be done on huge graphs, but testing travelling
salesman algorithm should be done on small graphs. How and where do you
"define" small/big?).

> c) test mathematical correctness more extensively by storing results of a larger set of results that have been verified to be mathematically correct in some way

This is a very nice idea.

> d) test unpickling of objects which seems to break rather often and is not covered at all by any of the doctests

+1 for test-framework(s) for any part of Sage which is currently untested.

Best,
Johan
--

Michael Orlitzky

unread,
Jun 22, 2016, 9:16:59 AM6/22/16
to sage-...@googlegroups.com
On 06/22/2016 04:57 AM, Jeroen Demeyer wrote:
>
>> b) randomized test, example: check for a number of randomly
>> generated number fields that arithmetic operations with randomly
>> generated number field elements gives the correct results.
>> Randomized tests help to identify issues that occur with input that
>> no one thought about testing.
>
> This can be done with doctests.
>

It's a little dangerous, our doctest framework uses the XKCD random
number generator. If you run ZZ.random_element() in a doctest it will
always output the same number. You have to work around it by calling
set_random_seed() before every test.

Apparently this was done so that doctest results involving random
numbers would be reproducible, which I find fascinating. A much better
approach would be to output the random seed whenever a test fails, but
now we have thousands of tests that don't know they're not getting
random numbers and will probably fail if we fix it.

Johan S. H. Rosenkilde

unread,
Jun 22, 2016, 9:29:27 AM6/22/16
to sage-...@googlegroups.com
> It's a little dangerous, our doctest framework uses the XKCD random
> number generator.

There's prior work by S. Adams:
http://dilbert.com/strip/2001-10-25

Best,
Johan

Marc Mezzarobba

unread,
Jun 22, 2016, 10:55:03 AM6/22/16
to sage-...@googlegroups.com
Michael Orlitzky wrote:
> It's a little dangerous, our doctest framework uses the XKCD random
> number generator. If you run ZZ.random_element() in a doctest it will
> always output the same number. You have to work around it by calling
> set_random_seed() before every test.

There is a @random_testing decorator that is supposed to help with that.

--
Marc

Stephan Ehlen

unread,
Jun 22, 2016, 1:08:32 PM6/22/16
to sage-devel
Thanks a lot for your answer, Jeroen.


On Wednesday, June 22, 2016 at 2:57:08 AM UTC-6, Jeroen Demeyer wrote:
On 2016-06-21 23:06, stephan...@gmail.com wrote:
> 1) You want to keep the source code clean so doctests should be “short”. But some test cases require more complicated code or have long output which you would not like to add to the source code.

For long or special doctests, you can put the tests in a separate module
containing only tests. We have some in src/sage/tests

> 2) You don’t want to have certain tests in the documentation of a function which would just distract the user.

For this, you can use TESTS: blocks which do not appear in the
documentation.

Good to know, I was not aware of that.
 

> 3) Some things can’t or cannot easily be tested in doctests.
>
> Would like to see for instance are:
> a) performance tests where we would test against the previous release to make sure that changes being introduced do not affect the performance in a negative way

That's really difficult to get right. But I'd love to hear good
suggestions. Keep in mind that timing an operation is the easy part, the
hard part is what to do with the timing results.

There is an old ticket about this, but it never got anywhere.
See https://trac.sagemath.org/ticket/12720

I'll take a look.
 

> b) randomized test, example: check for a number of randomly generated number fields that arithmetic operations with randomly generated number field elements gives the correct results. Randomized tests help to identify issues that occur with input that no one thought about testing.

This can be done with doctests.  

> c) test mathematical correctness more extensively by storing results of a larger set of results that have been verified to be mathematically correct in some way

This can be done with doctests.

I think there needs to be some standardized way to do so - in particular how and where to store the results.
I'm thinking of computations that result in more output that just a single integer.
For example, the q-expansion of a modular form with possibly large Fourier coefficients.
 

> (These tests in particular could run very long of we want to cover large ranges and we should not make these tests run by everyone but rather by some bots in parallel and maybe also not block new releases if untested but they could run continuously and block a new release if a problem is discovered. The data to check against would be publicly available and we could advertise that people install a bot on their machine that runs at scheduled times and just picks some examples that have not been checked with the current development version (or not on their particular architecture or OS version or so.))

This sounds like overkill. It would introduce "yet another" testing
mechanism besides the patchbot and the buildbot that we have to maintain.

Maybe not, it could be an additional/optional module of the patchbot/builtbot, I guess.
I just thought that it would make sense to not have all of such data tests run by all bots because there could be potentially many of such tests that run for a rather long time - but maybe that's already how the tests are distributed among the bots, which I know nothing about.
 

> d) test unpickling of objects which seems to break rather often and is not covered at all by any of the doctests

This can be done with doctests (possibly using the pickle jar).

How would the pickles be stored/distributed?
 

> Maybe not all of these tests would have to be run every time someone submits a patch but should be run before a release comes out.

I agree with this. We could add such tests if the release manager
agrees. However, this can be also be done with doctests (say, using an #
optional - release tag)

> What do you think?

I think that doctests are really just an interface. You can easily run
all kinds of tests using the doctester and it's nice to have a
consistent interface for testing. I think we should only introduce a new
mechanism if there is a clear need.

I totally agree that all of these things can _in principle_ be done using doctests, in particular using the TESTS block or the tests directory I wasn't aware of.
Of course, you can write a test function and call it in a doctest - but it's a bit weird, I find.
I also think that having doctests as the only interface might lead developers to not write more extensive tests, maybe because it seems restrictive and you don't want to mess up your code. If the consensus is to stick to this interface, maybe the developer documentation could give a few more pointers, e.g. to using TESTS and putting tests in the tests directory (or maybe it's already there and I only have to revisit it) and the reviewer checklist could be updated to encourage writing more tests.
What I found in the documentation, which clearly discourages to do any of the things I wrote above is this[1]:

"Even then, long doctests should ideally complete in 5 seconds or less. We know that you (the author) want to show off the capabilities of your code, but this is not the place to do so. Long-running tests will sooner or later hurt our ability to run the testsuite. Really, doctests should be as fast as possible while providing coverage for the code."


Stephan
 


Jeroen.

William Stein

unread,
Jun 22, 2016, 1:45:34 PM6/22/16
to sage-devel
On Wed, Jun 22, 2016 at 10:08 AM, Stephan Ehlen
<stephan...@gmail.com> wrote:
>> > d) test unpickling of objects which seems to break rather often and is
>> > not covered at all by any of the doctests
>>
>> This can be done with doctests (possibly using the pickle jar).
>
>
> How would the pickles be stored/distributed?

I wrote something to solve "tested working pickles" problem that has
been in use in Sage forever. Please type

sage: search_src('pickle_jar')

and follow the trail... The actual pickles end up in a tarball in ext/.

Doctests seem to work very well for Sage, since the doctest framework
is very refined due to a lot of work over the years (e.g., parallel
testing, support for optional components, etc.), and having paste-able
guaranteed-to-work examples of all code is very useful. Doctests
*suck* for certain other types of software projects, where unit tests
(and libraries like node) are vastly better. I have at various
points written a lot of code to randomly test things in Sage, throw
random input at Sage, and we used to do things like collect all
timings of all doctests and compare between releases. I'm really
glad to see you're interesting in improving this functionality in
Sage. The one thing to be careful about is not to increase the burden
of the release manager.

-- William
Reply all
Reply to author
Forward
0 new messages