# When is a test not a valid test?

6 views

### David Kirkby

Dec 1, 2010, 9:49:08 AM12/1/10
I'm somewhat unimpressed by the way some doc tests are constrained. An
example was at

http://trac.sagemath.org/sage_trac/ticket/10187

where I raised an issue.

sage: taylor(gamma(1/3+x),x,0,3)

-1/432*((36*(pi*sqrt(3) + 9*log(3))*euler_gamma^2 + 27*pi^2*log(3) +
72*euler_gamma^3 + 243*log(3)^3 + 18*(6*pi*sqrt(3)*log(3) + pi^2 +
27*log(3)^2 + 12*psi(1, 1/3))*euler_gamma + 324*psi(1, 1/3)*log(3) +
(pi^3 + 9*(9*log(3)^2 + 4*psi(1, 1/3))*pi)*sqrt(3))*gamma(1/3) -
72*gamma(1/3)*psi(2, 1/3))*x^3 + 1/24*(6*pi*sqrt(3)*log(3) +
4*(pi*sqrt(3) + 9*log(3))*euler_gamma + pi^2 + 12*euler_gamma^2 +
27*log(3)^2 + 12*psi(1, 1/3))*x^2*gamma(1/3) - 1/6*(6*euler_gamma +
pi*sqrt(3) + 9*log(3))*x*gamma(1/3) + gamma(1/3)

sage: map(lambda f:f[0].n(), _.coeffs())
[2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]

I asked the author on the ticket that added the numerical coefficients
( [2.6789385347..., -8.3905259853..., 26.662447494...,
-80.683148377...]) to justify them, since I wanted to know they were
right before giving this a positive review. The author remarked he was
not the original author of the long analytic expression, but doubted
it had ever been checked. However, he did agree to check the numerical
results he had added. He did this using Maple 12 and got the same

In this case I'm satisfied the bit of code added to get the numerical
results is probably OK, as it has been independently verified by
another package.. The probability of them both being wrong is very
small, since they should be developed largely independent of each
other. Also the analytic express is probably OK.

I really feel people should use doctests where the analytic results
can be verified, or at least justified in some way. If the results are
then be expressed as numerical results, whenever possible those
numerical results should be independently verified, as was done on
this ticket after I requested verification.

Method of verification could include

* Results given in a decent book
* Results computed by programs like Mathematic and Maple.
* Showing results are similar to an approximate method.

For example, if a bit of code claims to compute prime_pi(n) exactly
with n=10000000000000000000000000000000000000000000000000000000000000000000000000000000000
then that would be difficult to verify by other means. Mathematica for
example can't do it, and I doubt there is any computer could do it in

But there are numerical approximation for prime_pi, so computing a
numerical approximation, and showing it's similar to the numerical
equivalent of what was computed would be a reasonable verification the
function is correct.

It seems to me that many of the doctest have as expected values that's
basically whatever someone got on their computer. Sometimes they have
the sense to realise that different floating point processors will
give different results, so they add a few dots so not every digit is
expected to be the same.

To me at least, tests where the results are totally unjustified are
very poor tests, yet they seem to be quite common.

I was reading the other day about how one of the large Mersenne primes
was verified. I can't be exact, but it was something like:

* Found by one person on his computer using an AMD or Intel CPU
* Checked by another person using a different program on an Intel or AMD CPU
* Checked by a third person, on a Sun M9000 using a SPARC processor.

I'm not expecting us to such lengths, but I feel expected values
should be justified.

Whenever we run tests on the Python package we get failures. If we run
the Maxima test suite, we get failures, which appear with ECL as the
Lisp interpreter, but not on some other interpreters. This indicates
to me we should not put too much trusts into tests which re not
justified.

Dave

[1] An interesting experiment would be to find a proof that such a
number could not be computed before the Sun runs out of energy and so
all life on earth would be terminated. The designers of the 128-bit
file system used on Solaris have verified that the energy required to
fill the file system would be more than the energy required to boil
all the water in the oceans. I suspect similar arguments could be used
to prove one can't compute prime_pi(n) for sufficiently large n.

### David Roe

Dec 1, 2010, 1:18:16 PM12/1/10
I disagree that doctests should need to be independently verified.

Of course, if we had an arbitrarily large amount of time to write doctests, then it would be a laudible goal.  Even now, I think there are situations where it would be reasonable to ask this of the author of a patch: if there was some indication of inconsistency for example.  And if someone wants to go through the Sage library adding such consistency checks, I think that's a great way to improve Sage.  But it's already difficult enough to get code refereed without adding a requirement that code have such consistency checks.

The doctests that you object to fill two important roles:
1) they provide an example to a reader of the documentation how to use the function.
2) they provide a check so that if some change to the Sage library breaks something, we find out when testing.

Until we have 100% doctest coverage, I think that's plenty.
David

--
To post to this group, send an email to sage-...@googlegroups.com
To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

### David Kirkby

Dec 1, 2010, 2:32:55 PM12/1/10
On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:
> I disagree that doctests should need to be independently verified.

I think we will have to agree to differ then.

> Of course, if we had an arbitrarily large amount of time to write doctests,
> then it would be a laudible goal.  Even now, I think there are situations
> where it would be reasonable to ask this of the author of a patch: if there
> was some indication of inconsistency for example.  And if someone wants to
> go through the Sage library adding such consistency checks, I think that's a
> great way to improve Sage.

So you admit it would improve stage to check the tests.

> But it's already difficult enough to get code
> refereed without adding a requirement that code have such consistency
> checks.

It would probably be a bit easier to convince reviewers if your
doctests can be verified.

> The doctests that you object to fill two important roles:
> 1) they provide an example to a reader of the documentation how to use the
> function.

Yes, perhaps a confusing one if the answer is wrong. An embarrassing
one if the examples are wrong.

> 2) they provide a check so that if some change to the Sage library breaks
> something, we find out when testing.

> Until we have 100% doctest coverage, I think that's plenty.

100% covered of unverified tests is not worth a lot to me. What do you
propose we do when we get 100% coverage - go back and check if the
rests are valid or not? What a waste of time that would be. It would
be less overall effort to do the tests correctly the first time.

If you are going to give an example, how much longer does it take to
check if they are consistent with Mathematica or similar software? Or
chose an integral from a book?

Dave

### David Roe

Dec 1, 2010, 2:53:08 PM12/1/10

So you admit it would improve stage to check the tests.

Of course.  My argument is that imposing the requirement to have such consistency checks in order to get a positive review will make me less likely to contribute to Sage.

If you are going to give an example, how much longer does it take to
check if they are consistent with Mathematica or similar software? Or
chose an integral from a book?

For those of us without Mathematica installed it takes a decent amount of effort.  And even with other software installed, it's often nontrivial to determine an analogous function in another language (or impossible for testing lower level functions).
David

### William Stein

Dec 1, 2010, 3:01:30 PM12/1/10
On Wed, Dec 1, 2010 at 11:32 AM, David Kirkby <david....@onetel.net> wrote:
> On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:
>> I disagree that doctests should need to be independently verified.
>
> I think we will have to agree to differ then.

I agree with David Roe.

I also would like to encourage David Kirkby (or anybody else) to
independently test as many examples as they can, and if they uncover
any issues, open a ticket and post a patch. Also, if they are
refereeing new patches, do some testing of your own. I always do!
If anything, this independent checking should be the referee's job --
even if the author claimed to check things independently, the referee
would do well to double check some tests.

So David K., I hope you'll continue to "put your money where you
mouth" is and referee a lot of patches. You've done a massive amount
already. Keep up the good work.

But let's not make Sage too much more bureaucratic. If anything, it's
already too bureaucratic. I personally can hardly stand to submit
anything to Sage anymore because of this.

I do think it would be good to start using nosetest
in addition to doctests. This is how I've been testing the purple-sage
library (http://code.google.com/p/purplesage/), and for many cases it
does result in me writing much more comprehensive test suites.
Notetest is also very nice because it can run all the tests in a given
file in parallel. Also, when a test in a file fails, it can drop you
into a debugging shell right there with the failed test. This is
all something that we should start doing in addition to aiming for
100% doctest coverage for the sage library...

>> Of course, if we had an arbitrarily large amount of time to write doctests,
>> then it would be a laudible goal.  Even now, I think there are situations
>> where it would be reasonable to ask this of the author of a patch: if there
>> was some indication of inconsistency for example.  And if someone wants to
>> go through the Sage library adding such consistency checks, I think that's a
>> great way to improve Sage.
>

> So you admit it would improve sage to check the tests.

It's hard to deny.

>>  But it's already difficult enough to get code
>> refereed without adding a requirement that code have such consistency
>> checks.
>
> It would probably be a bit easier to convince reviewers if your
> doctests can be verified.

When people review, they should try to verify tests however they want.

>> The doctests that you object to fill two important roles:
>> 1) they provide an example to a reader of the documentation how to use the
>> function.
>
> Yes, perhaps a confusing one if the answer is wrong. An embarrassing
> one if the examples are wrong.
>
>> 2) they provide a check so that if some change to the Sage library breaks
>> something, we find out when testing.
>
>> Until we have 100% doctest coverage, I think that's plenty.
>
> 100% covered of unverified tests is not worth a lot to me. What do you
> propose we do when we get 100% coverage - go back and check if the
> rests are valid or not? What a waste of time that would be.

Verifying correctness of tests is not a waste of time.

> It would be less overall effort to do the tests correctly the first time.

People presumably *think* they are doing tests correctly. The point
is that you're wanting authors to submit "proofs" that they did
independent verification of results, and I think that is too much
bureaucracy. But asking referees to check claimed examples --
that makes sense! In particular, if I referee some code, and it
turns out somebody finds that the examples were just wrong, then I as
the referee will be pretty embarrassed.

> If you are going to give an example, how much longer does it take to
> check if they are consistent with Mathematica or similar software? Or
> chose an integral from a book?

That does raise an issue: one problems is that most of Sage isn't
calculus. Most code I write these days isn't available in any other
software...
A lot of what Sage does is available only in Magma say, which many

> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>

--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

### kcrisman

Dec 1, 2010, 3:38:19 PM12/1/10
to sage-devel

> But let's not make Sage too much more bureaucratic.  If anything, it's
> already too bureaucratic.  I personally can hardly stand to submit
> anything to Sage anymore because of this.

:(

> I do think it would be good to start using nosetest
> automatically run all functions that start with "test_" in all files,
> in addition to doctests. This is how I've been testing the purple-sage
> library (http://code.google.com/p/purplesage/), and for many cases it
> does result in me writing much more comprehensive test suites.

You mean http://trac.sagemath.org/sage_trac/ticket/9921?

> > If you are going to give an example, how much longer does it take to
> > check if they are consistent with Mathematica or similar software? Or
> > chose an integral from a book?
>
> That does raise an issue:  one problems is that most of Sage isn't
> calculus.   Most code I write these days isn't available in any other
> software...
> A lot of what Sage does is available only in Magma say, which many

And plots need to be tested 'by hand' by looking at them - which I do
a lot of when reviewing those tickets.

One interesting point coming out of this is that the onus is put on
the author, not the reviewer, for testing. I assume that means
"running doctests with ./sage -t or something", not "trying edge/
corner cases the author might not have thought of and making sure
those work", which I think does properly belong with the reviewer.

As I've said before, the framework R has for doing buildbotty stuff is
what we should be striving for, though Sage is far more involved, I
suppose, with all the subcomponents. Even having a buildbot for the
release manager has really improved things already, I think!

- kcrisman

### David Kirkby

Dec 1, 2010, 5:25:02 PM12/1/10
On 1 December 2010 20:01, William Stein <wst...@gmail.com> wrote:
> On Wed, Dec 1, 2010 at 11:32 AM, David Kirkby <david....@onetel.net> wrote:
>> On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:
>>> I disagree that doctests should need to be independently verified.
>>
>> I think we will have to agree to differ then.
>
> I agree with David Roe.

I thought you would.

> I also would like to encourage David Kirkby (or anybody else) to
> independently test as many examples as they can, and if they uncover
> any issues, open a ticket and post a patch.

For me personally, as a non-mathematician I'd have a problem with just
accepting a doctest, which I probably can't verify myself. In some
cases I can using Mathematica, and have done on some occasions. But in
the case of

http://trac.sagemath.org/sage_trac/ticket/10187

I could not. But in this case I'll trust the author when he says he
has verified this with Maple. I think Sage is better for that change.

>  Also, if they are
> refereeing new patches, do some testing of your own.  I always do!
> If anything, this independent checking should be the referee's job --
> even if the author claimed to check things independently, the referee
> would do well to double check some tests.

You have an advantage over me. Of course, I could decline to give a
positive review until a mathematician has said the patch is OK. That
would delay the ticket more of course.

> So David K., I hope you'll continue to "put your money where you
> mouth" is and referee a lot of patches.  You've done a massive amount
> already.  Keep up the good work.

But as I say, I'm restricted somewhat when people add tests I'm not
convinced of.

> But let's not make Sage too much more bureaucratic.  If anything, it's
> already too bureaucratic.  I personally can hardly stand to submit
> anything to Sage anymore because of this.

I realise you now have PSage. I feel its a shame you did not complete
the Cygwin port first, but that's your choice. I can understand your
reasons.

> I do think it would be good to start using nosetest
> automatically run all functions that start with "test_" in all files,

I suggested 'nose' was added a long time ago

>> It would probably be a bit easier to convince reviewers if your
>> doctests can be verified.
>
> When people review, they should try to verify tests however they want.

But one could make life a lot easier for a reviewer by picking
something when the results can be verified easily. If one writes a
test to show how to use function X, then the input probably does not
matter too much. So chose an input where the output can be verified,
rather than some input where it can't be.

>>> Until we have 100% doctest coverage, I think that's plenty.
>>
>> 100% covered of unverified tests is not worth a lot to me. What do you
>> propose we do when we get 100% coverage - go back and check if the
>> rests are valid or not? What a waste of time that would be.
>
> Verifying correctness of tests is not a waste of time.

I don't know what the current coverage is, but lets say for argument
it needs another 1000 tests to get 100% coverage. It's better to
verify those 1000 tests now, rather than wait to we get 100% coverage,
then go back and verify them.

>> It would be less overall effort to do the tests correctly the first time.
>
> People presumably *think* they are doing tests correctly.    The point
> is that you're wanting authors to submit "proofs" that they did
> independent verification of results, and I think that is too much
> bureaucracy.

No, I'm not suggesting a formal proof. In the case of the patch here

http://trac.sagemath.org/sage_trac/attachment/ticket/10187/trac_10187_fix_easy_doctests.patch

lines 345 & 346 was added, as a test, with nothing to say why. The
author has now said Maple 12 gives the same answer - I believe him in
this case.

I rather suspect the input, which shows how to use the taylor
function, could be any of numerous inputs. The one chosen

sage: taylor(gamma(1/3+x),x,0,3)

gives a huge output which is going to be next to impossible to verify
analytically. I rather suspect using a different series, where the
output was well known, would have been more logical.

>  But asking referees to check claimed examples --
> that makes sense!   In particular, if I referee some code, and it
> turns out somebody finds that the examples were just wrong, then I as
> the referee will be pretty embarrassed.

Yes, but using examples like

sage: taylor(gamma(1/3+x),x,0,3)

makes it almost impossible for a referee to check it, as the output is huge.

>
>> If you are going to give an example, how much longer does it take to
>> check if they are consistent with Mathematica or similar software? Or
>> chose an integral from a book?
>
> That does raise an issue:  one problems is that most of Sage isn't
> calculus.   Most code I write these days isn't available in any other
> software...
> A lot of what Sage does is available only in Magma say, which many

Fair enough. But you can at least state the Sage output is consistent
with that from Magma.

In any case, you stated only a week or so ago that Magma 2.13 is now
installed on sage.math

It's a shame the license of Wolfram Alpha does not allow for testing
software like Sage. (This was debated some time ago on sage-devel).
Otherwise that would give a nice easy way to verify *some* results.

"is 100001 prime"

http://www.wolframalpha.com/input/?i=is+100001+prime

> William Stein
> Professor of Mathematics
> University of Washington
> http://wstein.org

I appreciate in many cases it's not going to be possible to verify by
other means. One has to be extra careful about the code then.

Dave

### William Stein

Dec 1, 2010, 5:30:17 PM12/1/10
On Wed, Dec 1, 2010 at 12:38 PM, kcrisman <kcri...@gmail.com> wrote:
>
>> But let's not make Sage too much more bureaucratic.  If anything, it's
>> already too bureaucratic.  I personally can hardly stand to submit
>> anything to Sage anymore because of this.
>
> :(
>
>> I do think it would be good to start using nosetest
>> automatically run all functions that start with "test_" in all files,
>> in addition to doctests. This is how I've been testing the purple-sage
>> library (http://code.google.com/p/purplesage/), and for many cases it
>> does result in me writing much more comprehensive test suites.
>
> You mean http://trac.sagemath.org/sage_trac/ticket/9921?

Yes. I especially agree with David Kirkby's remark: "IMHO it would be
sensible to have nose as a standard package.".

>
>> > If you are going to give an example, how much longer does it take to
>> > check if they are consistent with Mathematica or similar software? Or
>> > chose an integral from a book?
>>
>> That does raise an issue:  one problems is that most of Sage isn't
>> calculus.   Most code I write these days isn't available in any other
>> software...
>> A lot of what Sage does is available only in Magma say, which many
>
> And plots need to be tested 'by hand' by looking at them - which I do
> a lot of when reviewing those tickets.
>
> One interesting point coming out of this is that the onus is put on
> the author, not the reviewer, for testing.  I assume that means
> "running doctests with ./sage -t or something", not "trying edge/
> corner cases the author might not have thought of and making sure
> those work", which I think does properly belong with the reviewer.

I disagree. The author *and* the reviewer should both do as much as they can
reasonably do.

William

### William Stein

Dec 1, 2010, 5:36:14 PM12/1/10
On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:
>> I do think it would be good to start using nosetest
>> automatically run all functions that start with "test_" in all files,
>
> I suggested 'nose' was added a long time ago
>
>

Well now that I know nose better, I agree with you. It's a really
awesome testing framework. I use it all the time for my own work now.

>> Verifying correctness of tests is not a waste of time.
>
> I don't know what the current coverage is, but lets say for argument
> it needs another 1000 tests to get 100% coverage. It's better to
> verify those 1000 tests now, rather than wait to we get 100% coverage,
> then go back and verify them.

Orthogonal to your remark, but in sage-4.6:

\$ sage -coverageall
...
Overall weighted coverage score: 84.3%
Total number of functions: 26592
We need 173 more function to get to 85% coverage.
We need 1503 more function to get to 90% coverage.
We need 2833 more function to get to 95% coverage.

It's only 2,833 tests!

>>  But asking referees to check claimed examples --
>> that makes sense!   In particular, if I referee some code, and it
>> turns out somebody finds that the examples were just wrong, then I as
>> the referee will be pretty embarrassed.
>
> Yes, but using examples like
>
> sage: taylor(gamma(1/3+x),x,0,3)
>
> makes it almost impossible for a referee to check it, as the output is huge.

I totally agree, and I think that's a very valid criticism for you to
make as a referee.

But let's not make a new policy out of this.

> In any case, you stated only a week or so ago that Magma 2.13 is now
> installed on sage.math
>

That is a post from 2006?!?

> It's a shame the license of Wolfram Alpha does not allow for testing
> software like Sage. (This was debated some time ago on sage-devel).
> Otherwise that would give a nice easy way to verify *some* results.
>
> "is 100001 prime"
>
> http://www.wolframalpha.com/input/?i=is+100001+prime

I'm not sure what you're talking about exactly at this point.
Referees can use wolfram alpha if they want to independently check
stuff... Do you mean adding doctests that call wolframalpha? That
would be weird.

-- William

>
>
>> William Stein
>> Professor of Mathematics
>> University of Washington
>> http://wstein.org
>
> I appreciate in many cases it's not going to be possible to verify by
> other means. One has to be extra careful about the code then.
>
> Dave
>

> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>

--

### François

Dec 1, 2010, 6:15:28 PM12/1/10
to sage-devel
On Dec 2, 11:36 am, William Stein <wst...@gmail.com> wrote:
> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david.kir...@onetel.net> wrote:
> >> Verifying correctness of tests is not a waste of time.
>
> > I don't know what the current coverage is, but lets say for argument
> > it needs another 1000 tests to get 100% coverage. It's better to
> > verify those 1000 tests now, rather than wait to we get 100% coverage,
> > then go back and verify them.
>
> Orthogonal to your remark, but in sage-4.6:
>
> \$ sage -coverageall
> ...
> Overall weighted coverage score:  84.3%
> Total number of functions:  26592
> We need  173 more function to get to 85% coverage.
> We need 1503 more function to get to 90% coverage.
> We need 2833 more function to get to 95% coverage.
>
> It's only 2,833 tests!
>
That figure assume that there are no duplicate tests in sage. They
are unfortunately a fact of life, like the two described here while
testing
sage-on-gentoo:
https://github.com/cschwan/sage-on-gentoo/issues/closed#issue/8

### Volker Braun

Dec 1, 2010, 6:22:13 PM12/1/10
to sage-devel
On Dec 1, 11:25 pm, David Kirkby <david.kir...@onetel.net> wrote:
> I rather suspect the input, which shows how to use the taylor
> function, could be any of numerous inputs. The one chosen
>
> sage: taylor(gamma(1/3+x),x,0,3)
>
> gives a huge output which is going to be next to impossible to verify
> analytically.

For the record, this particular doctest was added to check that trac
#9217 was fixed, and the surrounding text clearly states that. It was
not chosen because its a particularly illuminating example.

Volker

### David Kirkby

Dec 1, 2010, 6:31:08 PM12/1/10
On 1 December 2010 22:36, William Stein <wst...@gmail.com> wrote:
> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:
>>> I do think it would be good to start using nosetest
>>> automatically run all functions that start with "test_" in all files,
>>
>> I suggested 'nose' was added a long time ago
>>
>>
>
> Well now that I know nose better, I agree with you.  It's a really
> awesome testing framework.  I use it all the time for my own work now.

It would seem sensible to make it standard in that case. Making it
optional seems a bit less useful to me.

>> Yes, but using examples like
>>
>> sage: taylor(gamma(1/3+x),x,0,3)
>>
>> makes it almost impossible for a referee to check it, as the output is huge.
>
> I totally agree, and I think that's a very valid criticism for you to
> make as a referee.

The code I was refereeing did *not* add

sage: taylor(gamma(1/3+x),x,0,3)

That was there before

What I queried was the doctest which converted the huge symbolic
result to a much simpler numerical result, which was added in the
ticket in question. (It was added, as the format of Maxima had
changed, so a test was added to see that that the numerical values
were the same, even if the symbolic ones were not).

sage: map(lambda f:f[0].n(), _.coeffs()) # numerical coefficients to
make comparison easier; Maple 12 gives same answer
[2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]

After I asked, the author verified it in Maple 12, as the doctest
notes. So that probably means Maxima has it right

> But let's not make a new policy out of this.
>
>
>> In any case, you stated only a week or so ago that Magma 2.13 is now
>> installed on sage.math
>>
>
> That is a post from 2006?!?

Em, I thought I see the post recently and Googled for it.

>> It's a shame the license of Wolfram Alpha does not allow for testing
>> software like Sage. (This was debated some time ago on sage-devel).
>> Otherwise that would give a nice easy way to verify *some* results.
>>
>> "is 100001 prime"
>>
>> http://www.wolframalpha.com/input/?i=is+100001+prime
>
> I'm not sure what you're talking about exactly at this point.
> Referees can use wolfram alpha if they want to independently check
> stuff...

Yes, verifying results is OK.

But storing comments in the source code of Sage containing a large
number of comparisons with Wolfram Alpha may not be. See

http://www.wolframalpha.com/termsofuse.html

> Do you mean adding doctests that call wolframalpha?  That
> would be weird.

No, I was not thinking of that.

>  -- William

Dave

### kcrisman

Dec 1, 2010, 9:17:28 PM12/1/10
to sage-devel

On Dec 1, 5:30 pm, William Stein <wst...@gmail.com> wrote:
> On Wed, Dec 1, 2010 at 12:38 PM, kcrisman <kcris...@gmail.com> wrote:
>
> >> But let's not make Sage too much more bureaucratic.  If anything, it's
> >> already too bureaucratic.  I personally can hardly stand to submit
> >> anything to Sage anymore because of this.
>
> > :(
>
> >> I do think it would be good to start using nosetest
> >> automatically run all functions that start with "test_" in all files,
> >> in addition to doctests. This is how I've been testing the purple-sage
> >> library (http://code.google.com/p/purplesage/), and for many cases it
> >> does result in me writing much more comprehensive test suites.
>
> > You meanhttp://trac.sagemath.org/sage_trac/ticket/9921?
>
> Yes.  I especially agree with David Kirkby's remark: "IMHO it would be
> sensible to have nose as a standard package.".
>

Oh, great! Then I may put that on my to-do list. I know Jason is
also interested in this.

> > One interesting point coming out of this is that the onus is put on
> > the author, not the reviewer, for testing.  I assume that means
> > "running doctests with ./sage -t or something", not "trying edge/
> > corner cases the author might not have thought of and making sure
> > those work", which I think does properly belong with the reviewer.
>
> I disagree.  The author *and* the reviewer should both do as much as they can
> reasonably do.

- kcrisman

Dec 2, 2010, 1:20:54 PM12/2/10
On Wed, Dec 1, 2010 at 2:36 PM, William Stein <wst...@gmail.com> wrote:
> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:
>>> I do think it would be good to start using nosetest
>>> automatically run all functions that start with "test_" in all files,
>>
>> I suggested 'nose' was added a long time ago
>>
>>

I think there's a distinction between an spkg that people might find
useful to use with Sage, and an spkg that's actually used in in Sage.
For the former, if easy_install "just works," than it's not worth us
creating and maintaining a separate spkg, but for the latter, we
should ship it.

The fact that an upstream package use nose in its tests did not seem
like enough of a justification to create a whole new spkg, but if we
want to write Sage tests with nose than I have no objection. I
certainly think that there's a diminishing return on doctests once you
reach a certain point (which we're probably not at yet).

>>>  But asking referees to check claimed examples --
>>> that makes sense!   In particular, if I referee some code, and it
>>> turns out somebody finds that the examples were just wrong, then I as
>>> the referee will be pretty embarrassed.
>>
>> Yes, but using examples like
>>
>> sage: taylor(gamma(1/3+x),x,0,3)
>>
>> makes it almost impossible for a referee to check it, as the output is huge.

What would make a better test in this case would be taking the
resulting power series, perhaps to a higher degree of precision, and
evaluating at 0.1, 0.5, and showing that the result is close to
gamma(1/3 + 0.1), gamma(1/3 + 0.5). Or perhaps verifying that the 3rd
coefficient is equal to the 3rd derivative / 6.

> I totally agree, and I think that's a very valid criticism for you to
> make as a referee.
>
> But let's not make a new policy out of this.

+1. As more time is spent reading the code and tests rather than
applying patches, we should be more critical of good vs. bad tests.
This also goes with the ideal of making it really easy to edit a
patch, perhaps even online. (Imagine if you could run some code and
press a button to add that doctest to the library, pending refereeing
of course...)

>> In any case, you stated only a week or so ago that Magma 2.13 is now
>> installed on sage.math
>>
>
> That is a post from 2006?!?
>
>> It's a shame the license of Wolfram Alpha does not allow for testing
>> software like Sage. (This was debated some time ago on sage-devel).
>> Otherwise that would give a nice easy way to verify *some* results.
>>
>> "is 100001 prime"
>>
>> http://www.wolframalpha.com/input/?i=is+100001+prime
>
> I'm not sure what you're talking about exactly at this point.
> Referees can use wolfram alpha if they want to independently check
> stuff...  Do you mean adding doctests that call wolframalpha?  That
> would be weird.
>

>> I appreciate in many cases it's not going to be possible to verify by
>> other means. One has to be extra careful about the code then.

On the topic of verifying tests, I think internal consistency checks
are much better, both pedagogically and for verifiability, than
external checks against other (perhaps inaccessible) systems. For
example, the statement above that checks a power series against its
definition and properties, or (since you brought up the idea of
factorial) factorial(10) == prod([1..10]), or taking the derivative to
verify an integral. Especially in more advanced math there are so many
wonderful connections, both theorems and conjectures, that can be
verified with a good test. For example, computing all the BSD
invariants of an elliptic curve and verifying that the BSD formula
holds is a strong indicator that the invariants were computed
correctly via their various algorithms.

- Robert

### kcrisman

Dec 2, 2010, 1:42:37 PM12/2/10
to sage-devel

> >> I suggested 'nose' was added a long time ago
>
>
>
> I think there's a distinction between an spkg that people might find
> useful to use with Sage, and an spkg that's actually used in in Sage.
> For the former, if easy_install "just works," than it's not worth us
> creating and maintaining a separate spkg, but for the latter, we
> should ship it.
>
> The fact that an upstream package use nose in its tests did not seem
> like enough of a justification to create a whole new spkg, but if we
> want to write Sage tests with nose than I have no objection. I
> certainly think that there's a diminishing return on doctests once you
> reach a certain point (which we're probably not at yet).

I think the reason for this is to make it really easy to run spkg-
check on a number of spkgs. Like Numpy and Scipy. So if nose were
available in the spkg framework, this would be nice.

That said, maybe 'easy_install' is really as easy as ./sage -i nose
from the internet, in which case I suppose one could have an spkg-
check that relied on the internet... but that wouldn't be ideal, I
think.

- kcrisman

### Jason Grout

Dec 2, 2010, 1:46:47 PM12/2/10
On 12/2/10 12:42 PM, kcrisman wrote:

> That said, maybe 'easy_install' is really as easy as ./sage -i nose
> from the internet, in which case I suppose one could have an spkg-
> check that relied on the internet... but that wouldn't be ideal, I
> think.

But that would also prevent yet another spkg to maintain. We have a
hard enough time keeping up with spkg updates as it is.

As Robert says, if we're using nose in Sage, that's a different story.

Thanks,

Jason

### Rob Beezer

Dec 2, 2010, 2:03:34 PM12/2/10
to sage-devel
On Dec 2, 10:20 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:
> On the topic of verifying tests, I think internal consistency checks
> are much better, both pedagogically and for verifiability, than
> external checks against other (perhaps inaccessible) systems. For
> example, the statement above that checks a power series against its
> definition and properties, or (since you brought up the idea of
> factorial) factorial(10) == prod([1..10]), or taking the derivative to
> verify an integral. Especially in more advanced math there are so many
> wonderful connections, both theorems and conjectures, that can be
> verified with a good test. For example, computing all the BSD
> invariants of an elliptic curve and verifying that the BSD formula
> holds is a strong indicator that the invariants were computed
> correctly via their various algorithms.

A huge +1 to this. Couldn't have said it better. I sometimes become
a devious doctest writer (close cousin to the devious reviewer) and
try to write a doctest that links seemingly disparate parts of Sage in
complicated ways expressed by a theorem. For example, automorphism
groups of graphs sometimes have connections with eigenvalues of the
adjacency matrices of the graph. If something breaks in either part
of Sage, then such a test may expose it. And sometimes these tests
are very succinct since they can be constructed so the output is
simply "True." And properly written, they make for interesting

Rob

### kcrisman

Dec 2, 2010, 3:46:37 PM12/2/10
to sage-devel
So are you saying we should just give up on having SAGE_CHECK=yes do
anything for those packages if the user doesn't already have nose
installed? I just don't know what the consensus is on whether the
"batteries included" philosophy extends to something (SAGE_CHECK) that
the average user and even developer may not use.

- kcrisman

### kcrisman

Dec 2, 2010, 3:47:34 PM12/2/10
to sage-devel
To follow up my own thing, maybe it would be possible to write a spkg-
check that tries to detect nose, exits gracefully if it's not there,
and otherwise uses a system nose... though of course then one would be
using the system Python... wouldn't one?

- kcrisman

### David Kirkby

Dec 2, 2010, 9:40:46 PM12/2/10
On 2 December 2010 18:20, Robert Bradshaw <robe...@math.washington.edu> wrote:

> On the topic of verifying tests, I think internal consistency checks
> are much better, both pedagogically and for verifiability, than
> external checks against other (perhaps inaccessible) systems. For
> example, the statement above that checks a power series against its
> definition and properties, or (since you brought up the idea of
> factorial) factorial(10) == prod([1..10]), or taking the derivative to
> verify an integral.

Of course I can see logic in this, especially when the software may
not be available. Even though it has limitations, and those
limitations might increase with time, Wolfram Alpha is currently
available to everyone. (It helps if you know Mathematica, as you can
input Mathematica syntax directly).

* The person writing the mathematical code is usually the same person
who writes the test for that code. Any assumptions they make which are
incorrect may exist in both the algorithm and the test code. Of
course one hopes the referee picks this up, but the referee process,
while useful, is not perfect.

* The example you give with 10 factorial and prod([1..10], would
probably use a fair amount of common code - such as MPIR.

* Differentiate(Integrate(f)) = f, in practice for many functions
doing this in Sage does not lead back to the same expression, although
they are mathematically equivalent. Converting to a numerical form
can sometimes be used to show results are equal, but even two
equivalent, but non-identical numerical results often exist.

(I wrote some Sage code which generated "random" functions and
applied the integrate/differentiate method. If you get a complex
result back after the differentiation step, it is not easy to
determine if it's the same as you started with.).

Some, though not all of the above can be eliminated by using software
that is developed totally independently.. Of course, even using
Wolfram Alpha will use some code common to Sage since:

a) Wolfram Alpha uses Mathematica
b) Mathematica uses GMP & ATLAS
c) Sage uses MPIR (derrived from GMP) and ATLAS.

I suspect there is other common code too, but they are two I'm aware of.

> Especially in more advanced math there are so many
> wonderful connections, both theorems and conjectures, that can be
> verified with a good test. For example, computing all the BSD
> invariants of an elliptic curve and verifying that the BSD formula
> holds is a strong indicator that the invariants were computed
> correctly via their various algorithms.

I'll accept what you say!

It's clear you have the ability to write decent tests, but I think its
fair to say there are a lot of Sage developers who have less knowledge
of this subject than you.

As such, I believe independant verification using other software is
useful. Someone remarked earlier it is common in the commercial world
to compare your results to that of competitive products.

> - Robert

Dave

### Johan S. R. Nielsen

Dec 3, 2010, 3:15:21 AM12/3/10
to sage-devel
> On the topic of verifying tests, I think internal consistency checks
> are much better, both pedagogically and for verifiability, than
> external checks against other (perhaps inaccessible) systems. For
> example, the statement above that checks a power series against its
> definition and properties, or (since you brought up the idea of
> factorial) factorial(10) == prod([1..10]), or taking the derivative to
> verify an integral. Especially in more advanced math there are so many
> wonderful connections, both theorems and conjectures, that can be
> verified with a good test. For example, computing all the BSD
> invariants of an elliptic curve and verifying that the BSD formula
> holds is a strong indicator that the invariants were computed
> correctly via their various algorithms.
>
> - Robert

Also a huge +1 from me. This is something I have been thinking a lot
about how to utilise most elegantly, and I think one could take it a
step further than doctests. I often myself write "parameterised
tests": tests for properties of the output of functions based on
"random" input. For example, say I have a library of polynomials over
fields. Then a useful property to test is for any polynomials a,b to
satisfy
a*b == b*a
I could write a test to randomly generate 100 different pairs of
polynomials a,b to check with, over "random" fields. I know that some
people sometimes write such tests, and it is also suggested in the
Developer's Guide somewhere.
I love the Haskell test-suite QuickCheck, which allows one to write
such tests extremely declaratively and succinctly. Haskell is way cool
when it comes to types, so it provides an elegant way of specifying
how to randomly generate your input. Transfering this directly Python
or Sage can't be as elegant, but I have been working on a small python-
script -- basically an extension to unittest -- which could make it at
least easier to write these kinds of tests. It's not done yet and can
be improved in many ways, but I use it all the time on my code; it's
quite reassuring to have written a set of involved functions over
bivariate polynomials over fields and then check their internal
consistency with 100-degree polynomials over +1000 cardinality
fields :-D
My thought is that doctests are nice for educational purposes and
basic testing, but I myself like to test my code better while writing
it. I don't want to introduce more bureaucracy, so I don't suggest
that we should _require_ such tests, but it would be nice to have a
usual/standard way of writing such tests, if an author or reviewer
felt like it. More importantly, if it could be done in a systematic
way, all such tests could share the random generating functions: for
example, all functions working over any field would need a "generate a
random field"-function, and if there was a central place for these in
Sage, the most common structures would quickly be available, making
parameterised test writing even easier.

- Johan

### William Stein

Dec 3, 2010, 9:03:04 PM12/3/10
On Friday, December 3, 2010, Johan S. R. Nielsen

I think nosetest is a superb framework for writing such unittests,
which really do encourage a completely different kind of testing than
doctests.

> More importantly, if it could be done in a systematic
> way, all such tests could share the random generating functions: for
> example, all functions working over any field would need a "generate a
> random field"-function, and if there was a central place for these in

I wrote such a thing. See rings.tests or test or rando_ring (i am
sending from a cell phone).

> Sage, the most common structures would quickly be available, making
> parameterised test writing even easier.
>
> - Johan
>

### William Stein

Dec 4, 2010, 12:32:16 AM12/4/10
On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:
> On 2 December 2010 18:20, Robert Bradshaw <robe...@math.washington.edu> wrote:
>
>> On the topic of verifying tests, I think internal consistency checks
>> are much better, both pedagogically and for verifiability, than
>> external checks against other (perhaps inaccessible) systems. For
>> example, the statement above that checks a power series against its
>> definition and properties, or (since you brought up the idea of
>> factorial) factorial(10) == prod([1..10]), or taking the derivative to
>> verify an integral.
>
> Of course I can  see  logic in this, especially when the software may
> not be available. Even though it has limitations, and those
> limitations might increase with time, Wolfram Alpha is currently
> available to everyone. (It helps if you know Mathematica, as you can
> input Mathematica syntax directly).
>
>  * The person writing the mathematical code is usually the same person
> who writes the test for that code. Any assumptions they make which are
> incorrect  may exist in both the algorithm and the test code. Of
> course one hopes the referee picks this up, but the referee process,
> while useful, is not perfect.
>
>  * The example you give with 10 factorial and prod([1..10], would
> probably use a fair amount of common code - such as MPIR.

If you do

prod(range(1,11))

and compare that to "factorial(10)", I think it uses absolutely no
common code at all.

prod(range(1,11)) -- uses arithmetic with Python ints and the Sage
prod command (which Robert Bradshaw wrote from scratch in Cython).

factorial(10) -- calls a GMP function that is written in C, and
shares no code at all with Python.

-- William

>
> * Differentiate(Integrate(f)) = f, in practice for many functions
> doing this in Sage does not lead back to the same expression, although
> they are mathematically equivalent. Converting to a numerical form
> can sometimes be used to show results are equal, but even two
> equivalent, but non-identical numerical results often exist.

They have to be the same up to rounding errors, right, or it is a bug?
So numerically the absolute of the difference must be small.

>
>  (I wrote some Sage code which generated "random" functions and
> applied the integrate/differentiate method.  If you get a complex
> result back after the differentiation step, it is not easy to
> determine if it's the same as you started with.).
>
> Some, though not all of the above can be eliminated by using software
> that is developed totally independently.. Of course, even using

I don't see how checking differentiation or integration with
Mathematica would be any easier than doing the above. You still have
the problem of comparing two different symbolic expressions.

> Wolfram Alpha will use some code common to Sage since:
>
> a) Wolfram Alpha uses Mathematica
> b) Mathematica uses GMP & ATLAS
> c) Sage uses MPIR (derrived from GMP) and ATLAS.
>
> I suspect there is other common code too, but they are two I'm aware of.

I know of no code in common between Mathematica an Sage except GMP and
ATLAS. It would be very interesting to find out if there is any other
code in common. Does Mathematica use any other open source code at
all?

Note that as you point out above Sage uses MPIR whereas mathematica
uses GMP. These two libraries are _massively_ different at this point
-- probably sharing way less than 50% of their code, if that.

>> Especially in more advanced math there are so many
>> wonderful connections, both theorems and conjectures, that can be
>> verified with a good test. For example, computing all the BSD
>> invariants of an elliptic curve and verifying that the BSD formula
>> holds is a strong indicator that the invariants were computed
>> correctly via their various algorithms.
>
> I'll accept what you say!
>
> It's clear you have the ability to write decent tests, but I think its
> fair to say there are a lot of Sage developers who have less knowledge

> of this subject than you [=Bradshaw].

True. However, I think the general mathematical background of the
average Sage developer is fairly high. If you look down the second
column of

you'll see many have Ph.D.'s in mathematics, and most of those who
don't are currently getting Ph.D.'s in math.

> As such, I believe independant verification using other software is
> useful. Someone remarked earlier it is common in the commercial world
> to compare your results to that of competitive products.

+1 -- it's definitely useful. Everyone should use it when possible
in some ways.

But consistency comparisons using all open source software when
possible are very useful indeed, since they are more maintainable
longterm.

-- William

>
>> - Robert

### David Kirkby

Dec 6, 2010, 11:01:09 AM12/6/10
On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:
> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:

>> It's clear you have the ability to write decent tests, but I think its
>> fair to say there are a lot of Sage developers who have less knowledge
>> of this subject than you [=Bradshaw].
>
> True.  However, I think the general mathematical background of the
> average Sage developer is fairly high.   If you look down the second
> column of
>   http://sagemath.org/development-map.html
>
> you'll see many have Ph.D.'s in mathematics, and most of those who
> don't are currently getting Ph.D.'s in math.

This presupposes that people of fairly high mathematical knowledge are
good at writing software.

I'm yet to be convinced that having a PhD in maths, or studying for
one, makes you good at writing software tests. Unless those people
have studied the different sort of testing techniques available -
white box, black box, fuzz etc, then I fail to see how they can be in
a good position to write the tests.

It's fairly clear in the past that the "Expected" result from a test
is what someone happened to get on their computer, and they did not
appear to be aware that the same would not be true of other
processors.

Vladimir Bondarenko.has been very effective at finding bugs in
commercial maths software by use of various testing techniques, yet I
think I'm correct in saying Vladimir does not have a maths degree of
any soft.

>> As such, I believe independent verification using other software is

>> useful. Someone remarked earlier it is common in the commercial world
>> to compare your results to that of competitive products.
>
> +1 -- it's definitely useful.   Everyone should use it when possible
> in some ways.

I'm still waiting to hear from Wolfram Research on the use of Wolfram
Alpha for this. Personally I don't think there's anything in the terms
of use of Wolfram Alpha stopping use of the software for this, but
someone (I forget who), did question whether it is within the terms of
use or not.

> But consistency comparisons using all open source software when
> possible are very useful indeed, since they are more maintainable
> longterm.

Yes.

Especially if Wolfram Research thought it would hurt their revenue
to disallow the use of Wolfram Alpha to check other software.

>  -- William

Dave

Dec 6, 2010, 2:15:36 PM12/6/10
On Mon, Dec 6, 2010 at 8:01 AM, David Kirkby <david....@onetel.net> wrote:
> On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:
>> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:
>
>>> It's clear you have the ability to write decent tests, but I think its
>>> fair to say there are a lot of Sage developers who have less knowledge
>>> of this subject than you [=Bradshaw].
>>
>> True.  However, I think the general mathematical background of the
>> average Sage developer is fairly high.   If you look down the second
>> column of
>>   http://sagemath.org/development-map.html
>>
>> you'll see many have Ph.D.'s in mathematics, and most of those who
>> don't are currently getting Ph.D.'s in math.
>
> This presupposes that people of fairly high mathematical knowledge are
> good at writing software.

No, it's an observation that people of fairly high mathematical
knowledge are the ones actually writing software.

> I'm yet to be convinced that having a PhD in maths, or studying for
> one, makes you good at writing software tests. Unless those people
> have studied the different sort of testing techniques available -
> white box, black box, fuzz etc, then I fail to see how they can be in
> a good position to write the tests.

Because they understand what the code is trying to do, what results
should be expected, etc. If I told someone who was an expert in all
these (admittedly valuable) testing techniques to write some tests
that computed special values of L-functions of elliptic curves, how
would they do it? It's not like there's just a command in Mathematica
that can do this, and even if there were, who knows if they'd be able
to understand how to use it.

If I gave it to anyone with an understanding of elliptic curves,
they'd immediately pick a positive rank curve or two, and make sure
the value is very close to zero, then probably look up some special
values in the literature, etc. Or, say, the algorithm was to compute
heights of points. To someone without background, it would look like a
random function point -> floating point number, but to anyone in the
know they'd instantly write some tests to verify bi-linearity,
vanishing at torsion points, etc.

Of course, to achieve the ideal solution, you'd have someone with the
math and testing background and lots of time on their hands, or at
least have several different people with those skills involved.

> It's fairly clear in the past that the "Expected" result from a test
> is what someone happened  to get on their computer, and they did not
> appear to be aware that the same would not be true of other
> processors.

Most of the time that's due to floating point irregularities, and then
there's an even smaller percentage of the time that it's due to an
actual bug that didn't show up in the formerly-used environments. In
both of these cases the test, as written, wasn't (IMHO) wrong. Not
that there haven't been a couple of really bad cases where bad results
have been encoded into doctests, which is the fault of both the author
and referee, but I'm glad that these are rare enough to be quite
notable when discovered.

> Vladimir Bondarenko.has been very effective at finding bugs in
> commercial maths software by use of various testing techniques, yet I
> think I'm correct in saying Vladimir does not have a maths degree of
> any soft.

I agree, people of all backgrounds can make significant contributions.

>>> As such, I believe independent verification using other software is
>>> useful. Someone remarked earlier it is common in the commercial world
>>> to compare your results to that of competitive products.
>>
>> +1 -- it's definitely useful.   Everyone should use it when possible
>> in some ways.
>
> I'm still waiting to hear from Wolfram Research on the use of Wolfram
> Alpha for this. Personally I don't think there's anything in the terms
> of use of Wolfram Alpha stopping use of the software for this, but
> someone (I forget who), did question whether it is within the terms of
> use or not.
>
>> But consistency comparisons using all open source software when
>> possible are very useful indeed, since they are more maintainable
>> longterm.
>
> Yes.
>
> Especially if Wolfram Research thought it would hurt their revenue
> to disallow the use of Wolfram Alpha to check other software.

That would be a chilling statement indeed. "You're not allowed to
compare these results to those computed with open source software..."
Imagine the absurd consequences this would have on, e.g. results that
appear in publications.

- Robert

### Donald Alan Morrison

Dec 6, 2010, 9:52:47 PM12/6/10
to sage-devel
On Dec 6, 11:15 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:
> On Mon, Dec 6, 2010 at 8:01 AM, David Kirkby <david.kir...@onetel.net> wrote:
> > On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:
> >> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david.kir...@onetel.net> wrote:
>[*snip*]
> > It's fairly clear in the past that the "Expected" result from a test
> > is what someone happened  to get on their computer, and they did not
> > appear to be aware that the same would not be true of other
> > processors.
>
> Most of the time that's due to floating point irregularities, and then

http://docs.python.org/release/2.6.6/tutorial/floatingpoint.html#representation-error

If the "numerical noise" issue in sage testing has been in controversy
for so long, why not replace all such failing doctests with a warning
(if triggered) promising to convert it to a sensible test (not
dependent on floating point order of operations in hardware or said
base conversion); and dispense with all the vitriol? (Not referring to
any particular persons' vitriol -- I'm an equal opportunity observer
of circumlocution.)

> there's an even smaller percentage of the time that it's due to an
> actual bug that didn't show up in the formerly-used environments. In

> both of these cases the test, as written, wasn't (IMHO) wrong. Not

Why? If the test is non-deterministic, then you can have false-
positives and false-negatives. What's a good argument for that if you
can avoid it? If there are counter-examples (test case scenarios)
that prove you must take a statistical approach, then that would be an
entirely different testing framework.

> I agree, people of all backgrounds can make significant contributions.

http://trac.sagemath.org/sage_trac/ticket/8336

Robert and I had a long off-list discussion on this round() bug. The
problem IMHO, is not sticking to an interface; the requested invariant
(that the same precision type be returned) is not possible in some
cases. In other words, the interface/invariants are wrong, not the
test.

Speaking in maths terms, if the relation fails the vertical line test
(and is therefore not a function); why on earth would you call it a
function?

### rjf

Dec 8, 2010, 1:28:38 PM12/8/10
to sage-devel

On Dec 6, 8:01 am, David Kirkby <david.kir...@onetel.net> wrote:

> This presupposes that people of fairly high mathematical knowledge are
> good at writing software.
>
> I'm yet to be convinced that having a PhD in maths, or studying for
> one, makes you good at writing software tests

I quite agree. Or even writing software in the first place.

Unless those people
> have studied the different sort of testing techniques available -
> white box, black box, fuzz etc, then I fail to see how they can be in
> a good position to write the tests.

That would not be my criterion (studying that stuff). I've taught
courses
in software engineering and much of what is conveyed has almost no
applicability for scientific software testing.

>
> Vladimir Bondarenko.has been very effective at finding bugs in
> commercial maths software by use of various testing techniques, yet I
> think I'm correct in saying Vladimir does not have a maths degree of
> any soft.

If you believe that those are independent, pragmatically plausible
bugs, and not just presenting gibberish and looking to see what comes
out.
VB's problems are in having a degree or not.

>
> I'm still waiting to hear from Wolfram Research on the use of Wolfram
> Alpha for this.

Why would they bother to reply?

> Personally I don't think there's anything in the terms
> of use of Wolfram Alpha stopping use of the software for this, but
> someone (I forget who), did question whether it is within the terms of
> use or not.

Hey, this is a silly idea even if they deign to respond and say it is
OK.
You can just run Mathematica.
>
> > But consistency comparisons using all open source software when
> > possible are very useful indeed, since they are more maintainable
> > longterm.

>
> Yes.

really doubtful. Comparing 2 results that are equivalent but not
identical
(simplified differently) is difficult sometimes. Using someone's open
source software is potentially a big waste of time. You have to
debug that too! Your statement is kind of ambiguous... do you mean
"all open source software" or "only open source software"? or
maybe "selected open source software" or ... excluding closed
source...

Whether open source is "more maintainable"
is also naive. If someone (else) maintains the open source, and it
changes,
what then? If the open source no longer runs, and you have to
"maintain" it,
what then? Compare this to "does Mathematica [say] give an answer
that
is consistent? " {still problematical, but at least you don't have
to "maintain" it.}

>
> Especially if Wolfram Research thought it would hurt their revenue
> to disallow the use of Wolfram Alpha to check other software.

1. This is silly for reasons given above.
2. Why would they waste their (lawyer's) time.

### rjf

Dec 8, 2010, 1:35:15 PM12/8/10
to sage-devel

On Dec 6, 11:15 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:

> I agree, people of all backgrounds can make significant contributions.

Logically, nothing to argue with
"There may be a person X of {no particular specified background} who
can make a significant contribution"

I think we agree that we have higher expectations for people with
particular backgrounds.

As for WRI's lawyers making "chilling statements".
Again.. why would they bother?
And why should anyone care? Do you think that Wolfram Alpha will last
longer than Mathematica?

### kcrisman

Dec 8, 2010, 1:45:21 PM12/8/10
to sage-devel

> And why should anyone care?  Do you think that Wolfram Alpha will last
> longer than Mathematica?

I think the point was that not everyone who might want to do this
would have access to Mma, but that (for now) they would all have
access to W|A. Just to clarify - I don't really have a horse in this
race.

- kcrisman