6 views

Skip to first unread message

Dec 1, 2010, 9:49:08 AM12/1/10

to sage-...@googlegroups.com

I'm somewhat unimpressed by the way some doc tests are constrained. An

example was at

example was at

http://trac.sagemath.org/sage_trac/ticket/10187

where I raised an issue.

There was this test:

sage: taylor(gamma(1/3+x),x,0,3)

-1/432*((36*(pi*sqrt(3) + 9*log(3))*euler_gamma^2 + 27*pi^2*log(3) +

72*euler_gamma^3 + 243*log(3)^3 + 18*(6*pi*sqrt(3)*log(3) + pi^2 +

27*log(3)^2 + 12*psi(1, 1/3))*euler_gamma + 324*psi(1, 1/3)*log(3) +

(pi^3 + 9*(9*log(3)^2 + 4*psi(1, 1/3))*pi)*sqrt(3))*gamma(1/3) -

72*gamma(1/3)*psi(2, 1/3))*x^3 + 1/24*(6*pi*sqrt(3)*log(3) +

4*(pi*sqrt(3) + 9*log(3))*euler_gamma + pi^2 + 12*euler_gamma^2 +

27*log(3)^2 + 12*psi(1, 1/3))*x^2*gamma(1/3) - 1/6*(6*euler_gamma +

pi*sqrt(3) + 9*log(3))*x*gamma(1/3) + gamma(1/3)

sage: map(lambda f:f[0].n(), _.coeffs())

[2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]

I asked the author on the ticket that added the numerical coefficients

( [2.6789385347..., -8.3905259853..., 26.662447494...,

-80.683148377...]) to justify them, since I wanted to know they were

right before giving this a positive review. The author remarked he was

not the original author of the long analytic expression, but doubted

it had ever been checked. However, he did agree to check the numerical

results he had added. He did this using Maple 12 and got the same

answer as Sage.

In this case I'm satisfied the bit of code added to get the numerical

results is probably OK, as it has been independently verified by

another package.. The probability of them both being wrong is very

small, since they should be developed largely independent of each

other. Also the analytic express is probably OK.

I really feel people should use doctests where the analytic results

can be verified, or at least justified in some way. If the results are

then be expressed as numerical results, whenever possible those

numerical results should be independently verified, as was done on

this ticket after I requested verification.

Method of verification could include

* Results given in a decent book

* Results computed by programs like Mathematic and Maple.

* Showing results are similar to an approximate method.

For example, if a bit of code claims to compute prime_pi(n) exactly

with n=10000000000000000000000000000000000000000000000000000000000000000000000000000000000

then that would be difficult to verify by other means. Mathematica for

example can't do it, and I doubt there is any computer could do it in

my lifetime. [1]

But there are numerical approximation for prime_pi, so computing a

numerical approximation, and showing it's similar to the numerical

equivalent of what was computed would be a reasonable verification the

function is correct.

It seems to me that many of the doctest have as expected values that's

basically whatever someone got on their computer. Sometimes they have

the sense to realise that different floating point processors will

give different results, so they add a few dots so not every digit is

expected to be the same.

To me at least, tests where the results are totally unjustified are

very poor tests, yet they seem to be quite common.

I was reading the other day about how one of the large Mersenne primes

was verified. I can't be exact, but it was something like:

* Found by one person on his computer using an AMD or Intel CPU

* Checked by another person using a different program on an Intel or AMD CPU

* Checked by a third person, on a Sun M9000 using a SPARC processor.

I'm not expecting us to such lengths, but I feel expected values

should be justified.

Whenever we run tests on the Python package we get failures. If we run

the Maxima test suite, we get failures, which appear with ECL as the

Lisp interpreter, but not on some other interpreters. This indicates

to me we should not put too much trusts into tests which re not

justified.

Comments?

Dave

[1] An interesting experiment would be to find a proof that such a

number could not be computed before the Sun runs out of energy and so

all life on earth would be terminated. The designers of the 128-bit

file system used on Solaris have verified that the energy required to

fill the file system would be more than the energy required to boil

all the water in the oceans. I suspect similar arguments could be used

to prove one can't compute prime_pi(n) for sufficiently large n.

Dec 1, 2010, 1:18:16 PM12/1/10

to sage-...@googlegroups.com

I disagree that doctests should need to be independently verified.

Of course, if we had an arbitrarily large amount of time to write doctests, then it would be a laudible goal. Even now, I think there are situations where it would be reasonable to ask this of the author of a patch: if there was some indication of inconsistency for example. And if someone wants to go through the Sage library adding such consistency checks, I think that's a great way to improve Sage. But it's already difficult enough to get code refereed without adding a requirement that code have such consistency checks.

The doctests that you object to fill two important roles:

1) they provide an example to a reader of the documentation how to use the function.

2) they provide a check so that if some change to the Sage library breaks something, we find out when testing.

Until we have 100% doctest coverage, I think that's plenty.

David

Of course, if we had an arbitrarily large amount of time to write doctests, then it would be a laudible goal. Even now, I think there are situations where it would be reasonable to ask this of the author of a patch: if there was some indication of inconsistency for example. And if someone wants to go through the Sage library adding such consistency checks, I think that's a great way to improve Sage. But it's already difficult enough to get code refereed without adding a requirement that code have such consistency checks.

The doctests that you object to fill two important roles:

1) they provide an example to a reader of the documentation how to use the function.

2) they provide a check so that if some change to the Sage library breaks something, we find out when testing.

Until we have 100% doctest coverage, I think that's plenty.

David

--

To post to this group, send an email to sage-...@googlegroups.com

To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com

For more options, visit this group at http://groups.google.com/group/sage-devel

URL: http://www.sagemath.org

Dec 1, 2010, 2:32:55 PM12/1/10

to sage-...@googlegroups.com

On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:

> I disagree that doctests should need to be independently verified.

> I disagree that doctests should need to be independently verified.

I think we will have to agree to differ then.

> Of course, if we had an arbitrarily large amount of time to write doctests,

> then it would be a laudible goal. Even now, I think there are situations

> where it would be reasonable to ask this of the author of a patch: if there

> was some indication of inconsistency for example. And if someone wants to

> go through the Sage library adding such consistency checks, I think that's a

> great way to improve Sage.

So you admit it would improve stage to check the tests.

> But it's already difficult enough to get code

> refereed without adding a requirement that code have such consistency

> checks.

It would probably be a bit easier to convince reviewers if your

doctests can be verified.

> The doctests that you object to fill two important roles:

> 1) they provide an example to a reader of the documentation how to use the

> function.

Yes, perhaps a confusing one if the answer is wrong. An embarrassing

one if the examples are wrong.

> 2) they provide a check so that if some change to the Sage library breaks

> something, we find out when testing.

> Until we have 100% doctest coverage, I think that's plenty.

100% covered of unverified tests is not worth a lot to me. What do you

propose we do when we get 100% coverage - go back and check if the

rests are valid or not? What a waste of time that would be. It would

be less overall effort to do the tests correctly the first time.

If you are going to give an example, how much longer does it take to

check if they are consistent with Mathematica or similar software? Or

chose an integral from a book?

Dave

Dec 1, 2010, 2:53:08 PM12/1/10

to sage-...@googlegroups.com

So you admit it would improve stage to check the tests.

Of course. My argument is that imposing the requirement to have such consistency checks in order to get a positive review will make me less likely to contribute to Sage.

If you are going to give an example, how much longer does it take to

check if they are consistent with Mathematica or similar software? Or

chose an integral from a book?

For those of us without Mathematica installed it takes a decent amount of effort. And even with other software installed, it's often nontrivial to determine an analogous function in another language (or impossible for testing lower level functions).

David

Dec 1, 2010, 3:01:30 PM12/1/10

to sage-...@googlegroups.com

On Wed, Dec 1, 2010 at 11:32 AM, David Kirkby <david....@onetel.net> wrote:

> On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:

>> I disagree that doctests should need to be independently verified.

>

> I think we will have to agree to differ then.

> On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:

>> I disagree that doctests should need to be independently verified.

>

> I think we will have to agree to differ then.

I agree with David Roe.

I also would like to encourage David Kirkby (or anybody else) to

independently test as many examples as they can, and if they uncover

any issues, open a ticket and post a patch. Also, if they are

refereeing new patches, do some testing of your own. I always do!

If anything, this independent checking should be the referee's job --

even if the author claimed to check things independently, the referee

would do well to double check some tests.

So David K., I hope you'll continue to "put your money where you

mouth" is and referee a lot of patches. You've done a massive amount

already. Keep up the good work.

But let's not make Sage too much more bureaucratic. If anything, it's

already too bureaucratic. I personally can hardly stand to submit

anything to Sage anymore because of this.

I do think it would be good to start using nosetest

(http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

automatically run all functions that start with "test_" in all files,

in addition to doctests. This is how I've been testing the purple-sage

library (http://code.google.com/p/purplesage/), and for many cases it

does result in me writing much more comprehensive test suites.

Notetest is also very nice because it can run all the tests in a given

file in parallel. Also, when a test in a file fails, it can drop you

into a debugging shell right there with the failed test. This is

all something that we should start doing in addition to aiming for

100% doctest coverage for the sage library...

>> Of course, if we had an arbitrarily large amount of time to write doctests,

>> then it would be a laudible goal. Even now, I think there are situations

>> where it would be reasonable to ask this of the author of a patch: if there

>> was some indication of inconsistency for example. And if someone wants to

>> go through the Sage library adding such consistency checks, I think that's a

>> great way to improve Sage.

>

> So you admit it would improve sage to check the tests.

It's hard to deny.

>> But it's already difficult enough to get code

>> refereed without adding a requirement that code have such consistency

>> checks.

>

> It would probably be a bit easier to convince reviewers if your

> doctests can be verified.

When people review, they should try to verify tests however they want.

>> The doctests that you object to fill two important roles:

>> 1) they provide an example to a reader of the documentation how to use the

>> function.

>

> Yes, perhaps a confusing one if the answer is wrong. An embarrassing

> one if the examples are wrong.

>

>> 2) they provide a check so that if some change to the Sage library breaks

>> something, we find out when testing.

>

>> Until we have 100% doctest coverage, I think that's plenty.

>

> 100% covered of unverified tests is not worth a lot to me. What do you

> propose we do when we get 100% coverage - go back and check if the

> rests are valid or not? What a waste of time that would be.

Verifying correctness of tests is not a waste of time.

> It would be less overall effort to do the tests correctly the first time.

People presumably *think* they are doing tests correctly. The point

is that you're wanting authors to submit "proofs" that they did

independent verification of results, and I think that is too much

bureaucracy. But asking referees to check claimed examples --

that makes sense! In particular, if I referee some code, and it

turns out somebody finds that the examples were just wrong, then I as

the referee will be pretty embarrassed.

> If you are going to give an example, how much longer does it take to

> check if they are consistent with Mathematica or similar software? Or

> chose an integral from a book?

That does raise an issue: one problems is that most of Sage isn't

calculus. Most code I write these days isn't available in any other

software...

A lot of what Sage does is available only in Magma say, which many

people don't even have access to.

> --

> To post to this group, send an email to sage-...@googlegroups.com

> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com

> For more options, visit this group at http://groups.google.com/group/sage-devel

> URL: http://www.sagemath.org

>

--

William Stein

Professor of Mathematics

University of Washington

http://wstein.org

Dec 1, 2010, 3:38:19 PM12/1/10

to sage-devel

> But let's not make Sage too much more bureaucratic. If anything, it's

> already too bureaucratic. I personally can hardly stand to submit

> anything to Sage anymore because of this.

> I do think it would be good to start using nosetest

> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

> automatically run all functions that start with "test_" in all files,

> in addition to doctests. This is how I've been testing the purple-sage

> library (http://code.google.com/p/purplesage/), and for many cases it

> does result in me writing much more comprehensive test suites.

> > If you are going to give an example, how much longer does it take to

> > check if they are consistent with Mathematica or similar software? Or

> > chose an integral from a book?

>

> That does raise an issue: one problems is that most of Sage isn't

> calculus. Most code I write these days isn't available in any other

> software...

> A lot of what Sage does is available only in Magma say, which many

> people don't even have access to.

a lot of when reviewing those tickets.

One interesting point coming out of this is that the onus is put on

the author, not the reviewer, for testing. I assume that means

"running doctests with ./sage -t or something", not "trying edge/

corner cases the author might not have thought of and making sure

those work", which I think does properly belong with the reviewer.

As I've said before, the framework R has for doing buildbotty stuff is

what we should be striving for, though Sage is far more involved, I

suppose, with all the subcomponents. Even having a buildbot for the

release manager has really improved things already, I think!

- kcrisman

Dec 1, 2010, 5:25:02 PM12/1/10

to sage-...@googlegroups.com

On 1 December 2010 20:01, William Stein <wst...@gmail.com> wrote:

> On Wed, Dec 1, 2010 at 11:32 AM, David Kirkby <david....@onetel.net> wrote:

>> On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:

>>> I disagree that doctests should need to be independently verified.

>>

>> I think we will have to agree to differ then.

>

> I agree with David Roe.

> On Wed, Dec 1, 2010 at 11:32 AM, David Kirkby <david....@onetel.net> wrote:

>> On 1 December 2010 18:18, David Roe <ro...@math.harvard.edu> wrote:

>>> I disagree that doctests should need to be independently verified.

>>

>> I think we will have to agree to differ then.

>

> I agree with David Roe.

I thought you would.

> I also would like to encourage David Kirkby (or anybody else) to

> independently test as many examples as they can, and if they uncover

> any issues, open a ticket and post a patch.

For me personally, as a non-mathematician I'd have a problem with just

accepting a doctest, which I probably can't verify myself. In some

cases I can using Mathematica, and have done on some occasions. But in

the case of

http://trac.sagemath.org/sage_trac/ticket/10187

I could not. But in this case I'll trust the author when he says he

has verified this with Maple. I think Sage is better for that change.

> Also, if they are

> refereeing new patches, do some testing of your own. I always do!

> If anything, this independent checking should be the referee's job --

> even if the author claimed to check things independently, the referee

> would do well to double check some tests.

You have an advantage over me. Of course, I could decline to give a

positive review until a mathematician has said the patch is OK. That

would delay the ticket more of course.

> So David K., I hope you'll continue to "put your money where you

> mouth" is and referee a lot of patches. You've done a massive amount

> already. Keep up the good work.

But as I say, I'm restricted somewhat when people add tests I'm not

convinced of.

> But let's not make Sage too much more bureaucratic. If anything, it's

> already too bureaucratic. I personally can hardly stand to submit

> anything to Sage anymore because of this.

I realise you now have PSage. I feel its a shame you did not complete

the Cygwin port first, but that's your choice. I can understand your

reasons.

> I do think it would be good to start using nosetest

> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

> automatically run all functions that start with "test_" in all files,

I suggested 'nose' was added a long time ago

the only person to reply (Robert Bradshaw) disagreed.

>> It would probably be a bit easier to convince reviewers if your

>> doctests can be verified.

>

> When people review, they should try to verify tests however they want.

But one could make life a lot easier for a reviewer by picking

something when the results can be verified easily. If one writes a

test to show how to use function X, then the input probably does not

matter too much. So chose an input where the output can be verified,

rather than some input where it can't be.

>>> Until we have 100% doctest coverage, I think that's plenty.

>>

>> 100% covered of unverified tests is not worth a lot to me. What do you

>> propose we do when we get 100% coverage - go back and check if the

>> rests are valid or not? What a waste of time that would be.

>

> Verifying correctness of tests is not a waste of time.

I don't know what the current coverage is, but lets say for argument

it needs another 1000 tests to get 100% coverage. It's better to

verify those 1000 tests now, rather than wait to we get 100% coverage,

then go back and verify them.

>> It would be less overall effort to do the tests correctly the first time.

>

> People presumably *think* they are doing tests correctly. The point

> is that you're wanting authors to submit "proofs" that they did

> independent verification of results, and I think that is too much

> bureaucracy.

No, I'm not suggesting a formal proof. In the case of the patch here

http://trac.sagemath.org/sage_trac/attachment/ticket/10187/trac_10187_fix_easy_doctests.patch

lines 345 & 346 was added, as a test, with nothing to say why. The

author has now said Maple 12 gives the same answer - I believe him in

this case.

I rather suspect the input, which shows how to use the taylor

function, could be any of numerous inputs. The one chosen

sage: taylor(gamma(1/3+x),x,0,3)

gives a huge output which is going to be next to impossible to verify

analytically. I rather suspect using a different series, where the

output was well known, would have been more logical.

> But asking referees to check claimed examples --

> that makes sense! In particular, if I referee some code, and it

> turns out somebody finds that the examples were just wrong, then I as

> the referee will be pretty embarrassed.

Yes, but using examples like

sage: taylor(gamma(1/3+x),x,0,3)

makes it almost impossible for a referee to check it, as the output is huge.

>

>> If you are going to give an example, how much longer does it take to

>> check if they are consistent with Mathematica or similar software? Or

>> chose an integral from a book?

>

> That does raise an issue: one problems is that most of Sage isn't

> calculus. Most code I write these days isn't available in any other

> software...

> A lot of what Sage does is available only in Magma say, which many

> people don't even have access to.

Fair enough. But you can at least state the Sage output is consistent

with that from Magma.

In any case, you stated only a week or so ago that Magma 2.13 is now

installed on sage.math

http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en

It's a shame the license of Wolfram Alpha does not allow for testing

software like Sage. (This was debated some time ago on sage-devel).

Otherwise that would give a nice easy way to verify *some* results.

"is 100001 prime"

http://www.wolframalpha.com/input/?i=is+100001+prime

> William Stein

> Professor of Mathematics

> University of Washington

> http://wstein.org

I appreciate in many cases it's not going to be possible to verify by

other means. One has to be extra careful about the code then.

Dave

Dec 1, 2010, 5:30:17 PM12/1/10

to sage-...@googlegroups.com

On Wed, Dec 1, 2010 at 12:38 PM, kcrisman <kcri...@gmail.com> wrote:

>

>> But let's not make Sage too much more bureaucratic. If anything, it's

>> already too bureaucratic. I personally can hardly stand to submit

>> anything to Sage anymore because of this.

>

> :(

>

>> I do think it would be good to start using nosetest

>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>> automatically run all functions that start with "test_" in all files,

>> in addition to doctests. This is how I've been testing the purple-sage

>> library (http://code.google.com/p/purplesage/), and for many cases it

>> does result in me writing much more comprehensive test suites.

>

> You mean http://trac.sagemath.org/sage_trac/ticket/9921?

>

>> But let's not make Sage too much more bureaucratic. If anything, it's

>> already too bureaucratic. I personally can hardly stand to submit

>> anything to Sage anymore because of this.

>

> :(

>

>> I do think it would be good to start using nosetest

>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>> automatically run all functions that start with "test_" in all files,

>> in addition to doctests. This is how I've been testing the purple-sage

>> library (http://code.google.com/p/purplesage/), and for many cases it

>> does result in me writing much more comprehensive test suites.

>

> You mean http://trac.sagemath.org/sage_trac/ticket/9921?

Yes. I especially agree with David Kirkby's remark: "IMHO it would be

sensible to have nose as a standard package.".

>

>> > If you are going to give an example, how much longer does it take to

>> > check if they are consistent with Mathematica or similar software? Or

>> > chose an integral from a book?

>>

>> That does raise an issue: one problems is that most of Sage isn't

>> calculus. Most code I write these days isn't available in any other

>> software...

>> A lot of what Sage does is available only in Magma say, which many

>> people don't even have access to.

>

> And plots need to be tested 'by hand' by looking at them - which I do

> a lot of when reviewing those tickets.

>

> One interesting point coming out of this is that the onus is put on

> the author, not the reviewer, for testing. I assume that means

> "running doctests with ./sage -t or something", not "trying edge/

> corner cases the author might not have thought of and making sure

> those work", which I think does properly belong with the reviewer.

I disagree. The author *and* the reviewer should both do as much as they can

reasonably do.

William

Dec 1, 2010, 5:36:14 PM12/1/10

to sage-...@googlegroups.com

On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:

>> I do think it would be good to start using nosetest

>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>> automatically run all functions that start with "test_" in all files,

>

> I suggested 'nose' was added a long time ago

>

> http://groups.google.com/group/sage-devel/browse_thread/thread/928632557a8a041c/f8bc25a249ea4483?hl=en&lnk=gst&q=nose#f8bc25a249ea4483

>

> the only person to reply (Robert Bradshaw) disagreed.

>> I do think it would be good to start using nosetest

>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>> automatically run all functions that start with "test_" in all files,

>

> I suggested 'nose' was added a long time ago

>

> http://groups.google.com/group/sage-devel/browse_thread/thread/928632557a8a041c/f8bc25a249ea4483?hl=en&lnk=gst&q=nose#f8bc25a249ea4483

>

> the only person to reply (Robert Bradshaw) disagreed.

Well now that I know nose better, I agree with you. It's a really

awesome testing framework. I use it all the time for my own work now.

>> Verifying correctness of tests is not a waste of time.

>

> I don't know what the current coverage is, but lets say for argument

> it needs another 1000 tests to get 100% coverage. It's better to

> verify those 1000 tests now, rather than wait to we get 100% coverage,

> then go back and verify them.

Orthogonal to your remark, but in sage-4.6:

$ sage -coverageall

...

Overall weighted coverage score: 84.3%

Total number of functions: 26592

We need 173 more function to get to 85% coverage.

We need 1503 more function to get to 90% coverage.

We need 2833 more function to get to 95% coverage.

It's only 2,833 tests!

>> But asking referees to check claimed examples --

>> that makes sense! In particular, if I referee some code, and it

>> turns out somebody finds that the examples were just wrong, then I as

>> the referee will be pretty embarrassed.

>

> Yes, but using examples like

>

> sage: taylor(gamma(1/3+x),x,0,3)

>

> makes it almost impossible for a referee to check it, as the output is huge.

I totally agree, and I think that's a very valid criticism for you to

make as a referee.

But let's not make a new policy out of this.

> In any case, you stated only a week or so ago that Magma 2.13 is now

> installed on sage.math

>

> http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en

That is a post from 2006?!?

> It's a shame the license of Wolfram Alpha does not allow for testing

> software like Sage. (This was debated some time ago on sage-devel).

> Otherwise that would give a nice easy way to verify *some* results.

>

> "is 100001 prime"

>

> http://www.wolframalpha.com/input/?i=is+100001+prime

I'm not sure what you're talking about exactly at this point.

Referees can use wolfram alpha if they want to independently check

stuff... Do you mean adding doctests that call wolframalpha? That

would be weird.

-- William

>

>

>> William Stein

>> Professor of Mathematics

>> University of Washington

>> http://wstein.org

>

> I appreciate in many cases it's not going to be possible to verify by

> other means. One has to be extra careful about the code then.

>

> Dave

>

> --

> To post to this group, send an email to sage-...@googlegroups.com

> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com

> For more options, visit this group at http://groups.google.com/group/sage-devel

> URL: http://www.sagemath.org

>

--

Dec 1, 2010, 6:15:28 PM12/1/10

to sage-devel

On Dec 2, 11:36 am, William Stein <wst...@gmail.com> wrote:

are unfortunately a fact of life, like the two described here while

testing

sage-on-gentoo:

https://github.com/cschwan/sage-on-gentoo/issues/closed#issue/8

> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david.kir...@onetel.net> wrote:

> >> Verifying correctness of tests is not a waste of time.

>

> > I don't know what the current coverage is, but lets say for argument

> > it needs another 1000 tests to get 100% coverage. It's better to

> > verify those 1000 tests now, rather than wait to we get 100% coverage,

> > then go back and verify them.

>

> Orthogonal to your remark, but in sage-4.6:

>

> $ sage -coverageall

> ...

> Overall weighted coverage score: 84.3%

> Total number of functions: 26592

> We need 173 more function to get to 85% coverage.

> We need 1503 more function to get to 90% coverage.

> We need 2833 more function to get to 95% coverage.

>

> It's only 2,833 tests!

>

That figure assume that there are no duplicate tests in sage. They
> >> Verifying correctness of tests is not a waste of time.

>

> > I don't know what the current coverage is, but lets say for argument

> > it needs another 1000 tests to get 100% coverage. It's better to

> > verify those 1000 tests now, rather than wait to we get 100% coverage,

> > then go back and verify them.

>

> Orthogonal to your remark, but in sage-4.6:

>

> $ sage -coverageall

> ...

> Overall weighted coverage score: 84.3%

> Total number of functions: 26592

> We need 173 more function to get to 85% coverage.

> We need 1503 more function to get to 90% coverage.

> We need 2833 more function to get to 95% coverage.

>

> It's only 2,833 tests!

>

are unfortunately a fact of life, like the two described here while

testing

sage-on-gentoo:

https://github.com/cschwan/sage-on-gentoo/issues/closed#issue/8

Dec 1, 2010, 6:22:13 PM12/1/10

to sage-devel

On Dec 1, 11:25 pm, David Kirkby <david.kir...@onetel.net> wrote:

> I rather suspect the input, which shows how to use the taylor

> function, could be any of numerous inputs. The one chosen

>

> sage: taylor(gamma(1/3+x),x,0,3)

>

> gives a huge output which is going to be next to impossible to verify

> analytically.

For the record, this particular doctest was added to check that trac
> I rather suspect the input, which shows how to use the taylor

> function, could be any of numerous inputs. The one chosen

>

> sage: taylor(gamma(1/3+x),x,0,3)

>

> gives a huge output which is going to be next to impossible to verify

> analytically.

#9217 was fixed, and the surrounding text clearly states that. It was

not chosen because its a particularly illuminating example.

Volker

Dec 1, 2010, 6:31:08 PM12/1/10

to sage-...@googlegroups.com

On 1 December 2010 22:36, William Stein <wst...@gmail.com> wrote:

> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:

>>> I do think it would be good to start using nosetest

>>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>>> automatically run all functions that start with "test_" in all files,

>>

>> I suggested 'nose' was added a long time ago

>>

>> http://groups.google.com/group/sage-devel/browse_thread/thread/928632557a8a041c/f8bc25a249ea4483?hl=en&lnk=gst&q=nose#f8bc25a249ea4483

>>

>> the only person to reply (Robert Bradshaw) disagreed.

>

> Well now that I know nose better, I agree with you. It's a really

> awesome testing framework. I use it all the time for my own work now.

> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:

>>> I do think it would be good to start using nosetest

>>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>>> automatically run all functions that start with "test_" in all files,

>>

>> I suggested 'nose' was added a long time ago

>>

>> http://groups.google.com/group/sage-devel/browse_thread/thread/928632557a8a041c/f8bc25a249ea4483?hl=en&lnk=gst&q=nose#f8bc25a249ea4483

>>

>> the only person to reply (Robert Bradshaw) disagreed.

>

> Well now that I know nose better, I agree with you. It's a really

> awesome testing framework. I use it all the time for my own work now.

It would seem sensible to make it standard in that case. Making it

optional seems a bit less useful to me.

>> Yes, but using examples like

>>

>> sage: taylor(gamma(1/3+x),x,0,3)

>>

>> makes it almost impossible for a referee to check it, as the output is huge.

>

> I totally agree, and I think that's a very valid criticism for you to

> make as a referee.

The code I was refereeing did *not* add

sage: taylor(gamma(1/3+x),x,0,3)

That was there before

What I queried was the doctest which converted the huge symbolic

result to a much simpler numerical result, which was added in the

ticket in question. (It was added, as the format of Maxima had

changed, so a test was added to see that that the numerical values

were the same, even if the symbolic ones were not).

sage: map(lambda f:f[0].n(), _.coeffs()) # numerical coefficients to

make comparison easier; Maple 12 gives same answer

[2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]

After I asked, the author verified it in Maple 12, as the doctest

notes. So that probably means Maxima has it right

> But let's not make a new policy out of this.

>

>

>> In any case, you stated only a week or so ago that Magma 2.13 is now

>> installed on sage.math

>>

>> http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en

>

> That is a post from 2006?!?

Em, I thought I see the post recently and Googled for it.

>> It's a shame the license of Wolfram Alpha does not allow for testing

>> software like Sage. (This was debated some time ago on sage-devel).

>> Otherwise that would give a nice easy way to verify *some* results.

>>

>> "is 100001 prime"

>>

>> http://www.wolframalpha.com/input/?i=is+100001+prime

>

> I'm not sure what you're talking about exactly at this point.

> Referees can use wolfram alpha if they want to independently check

> stuff...

Yes, verifying results is OK.

But storing comments in the source code of Sage containing a large

number of comparisons with Wolfram Alpha may not be. See

http://groups.google.com/group/sage-devel/msg/1f8af294fbf40ccc?hl=en

where Alex Ghitza pointed out this might breach the terms of use.

http://www.wolframalpha.com/termsofuse.html

> Do you mean adding doctests that call wolframalpha? That

> would be weird.

No, I was not thinking of that.

> -- William

Dave

Dec 1, 2010, 9:17:28 PM12/1/10

to sage-devel

On Dec 1, 5:30 pm, William Stein <wst...@gmail.com> wrote:

> On Wed, Dec 1, 2010 at 12:38 PM, kcrisman <kcris...@gmail.com> wrote:

>

> >> But let's not make Sage too much more bureaucratic. If anything, it's

> >> already too bureaucratic. I personally can hardly stand to submit

> >> anything to Sage anymore because of this.

>

> > :(

>

> >> I do think it would be good to start using nosetest

> >> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

> >> automatically run all functions that start with "test_" in all files,

> >> in addition to doctests. This is how I've been testing the purple-sage

> >> library (http://code.google.com/p/purplesage/), and for many cases it

> >> does result in me writing much more comprehensive test suites.

>

> > You meanhttp://trac.sagemath.org/sage_trac/ticket/9921?
>

> >> But let's not make Sage too much more bureaucratic. If anything, it's

> >> already too bureaucratic. I personally can hardly stand to submit

> >> anything to Sage anymore because of this.

>

> > :(

>

> >> I do think it would be good to start using nosetest

> >> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

> >> automatically run all functions that start with "test_" in all files,

> >> in addition to doctests. This is how I've been testing the purple-sage

> >> library (http://code.google.com/p/purplesage/), and for many cases it

> >> does result in me writing much more comprehensive test suites.

>

>

> Yes. I especially agree with David Kirkby's remark: "IMHO it would be

> sensible to have nose as a standard package.".

>

Oh, great! Then I may put that on my to-do list. I know Jason is
> Yes. I especially agree with David Kirkby's remark: "IMHO it would be

> sensible to have nose as a standard package.".

>

also interested in this.

> > One interesting point coming out of this is that the onus is put on

> > the author, not the reviewer, for testing. I assume that means

> > "running doctests with ./sage -t or something", not "trying edge/

> > corner cases the author might not have thought of and making sure

> > those work", which I think does properly belong with the reviewer.

>

> I disagree. The author *and* the reviewer should both do as much as they can

> reasonably do.

- kcrisman

Dec 2, 2010, 1:20:54 PM12/2/10

to sage-...@googlegroups.com

On Wed, Dec 1, 2010 at 2:36 PM, William Stein <wst...@gmail.com> wrote:

> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:

>>> I do think it would be good to start using nosetest

>>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>>> automatically run all functions that start with "test_" in all files,

>>

>> I suggested 'nose' was added a long time ago

>>

>> http://groups.google.com/group/sage-devel/browse_thread/thread/928632557a8a041c/f8bc25a249ea4483?hl=en&lnk=gst&q=nose#f8bc25a249ea4483

>>

>> the only person to reply (Robert Bradshaw) disagreed.

> On Wed, Dec 1, 2010 at 2:25 PM, David Kirkby <david....@onetel.net> wrote:

>>> I do think it would be good to start using nosetest

>>> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to

>>> automatically run all functions that start with "test_" in all files,

>>

>> I suggested 'nose' was added a long time ago

>>

>> http://groups.google.com/group/sage-devel/browse_thread/thread/928632557a8a041c/f8bc25a249ea4483?hl=en&lnk=gst&q=nose#f8bc25a249ea4483

>>

>> the only person to reply (Robert Bradshaw) disagreed.

I think there's a distinction between an spkg that people might find

useful to use with Sage, and an spkg that's actually used in in Sage.

For the former, if easy_install "just works," than it's not worth us

creating and maintaining a separate spkg, but for the latter, we

should ship it.

The fact that an upstream package use nose in its tests did not seem

like enough of a justification to create a whole new spkg, but if we

want to write Sage tests with nose than I have no objection. I

certainly think that there's a diminishing return on doctests once you

reach a certain point (which we're probably not at yet).

>>> But asking referees to check claimed examples --

>>> that makes sense! In particular, if I referee some code, and it

>>> turns out somebody finds that the examples were just wrong, then I as

>>> the referee will be pretty embarrassed.

>>

>> Yes, but using examples like

>>

>> sage: taylor(gamma(1/3+x),x,0,3)

>>

>> makes it almost impossible for a referee to check it, as the output is huge.

What would make a better test in this case would be taking the

resulting power series, perhaps to a higher degree of precision, and

evaluating at 0.1, 0.5, and showing that the result is close to

gamma(1/3 + 0.1), gamma(1/3 + 0.5). Or perhaps verifying that the 3rd

coefficient is equal to the 3rd derivative / 6.

> I totally agree, and I think that's a very valid criticism for you to

> make as a referee.

>

> But let's not make a new policy out of this.

+1. As more time is spent reading the code and tests rather than

applying patches, we should be more critical of good vs. bad tests.

This also goes with the ideal of making it really easy to edit a

patch, perhaps even online. (Imagine if you could run some code and

press a button to add that doctest to the library, pending refereeing

of course...)

>> In any case, you stated only a week or so ago that Magma 2.13 is now

>> installed on sage.math

>>

>> http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en

>

> That is a post from 2006?!?

>

>> It's a shame the license of Wolfram Alpha does not allow for testing

>> software like Sage. (This was debated some time ago on sage-devel).

>> Otherwise that would give a nice easy way to verify *some* results.

>>

>> "is 100001 prime"

>>

>> http://www.wolframalpha.com/input/?i=is+100001+prime

>

> I'm not sure what you're talking about exactly at this point.

> Referees can use wolfram alpha if they want to independently check

> stuff... Do you mean adding doctests that call wolframalpha? That

> would be weird.

>

>> I appreciate in many cases it's not going to be possible to verify by

>> other means. One has to be extra careful about the code then.

On the topic of verifying tests, I think internal consistency checks

are much better, both pedagogically and for verifiability, than

external checks against other (perhaps inaccessible) systems. For

example, the statement above that checks a power series against its

definition and properties, or (since you brought up the idea of

factorial) factorial(10) == prod([1..10]), or taking the derivative to

verify an integral. Especially in more advanced math there are so many

wonderful connections, both theorems and conjectures, that can be

verified with a good test. For example, computing all the BSD

invariants of an elliptic curve and verifying that the BSD formula

holds is a strong indicator that the invariants were computed

correctly via their various algorithms.

- Robert

Dec 2, 2010, 1:42:37 PM12/2/10

to sage-devel

> >> I suggested 'nose' was added a long time ago

>

>

> >> the only person to reply (Robert Bradshaw) disagreed.

>

> I think there's a distinction between an spkg that people might find

> useful to use with Sage, and an spkg that's actually used in in Sage.

> For the former, if easy_install "just works," than it's not worth us

> creating and maintaining a separate spkg, but for the latter, we

> should ship it.

>

> The fact that an upstream package use nose in its tests did not seem

> like enough of a justification to create a whole new spkg, but if we

> want to write Sage tests with nose than I have no objection. I

> certainly think that there's a diminishing return on doctests once you

> reach a certain point (which we're probably not at yet).

I think the reason for this is to make it really easy to run spkg-
> >> the only person to reply (Robert Bradshaw) disagreed.

>

> I think there's a distinction between an spkg that people might find

> useful to use with Sage, and an spkg that's actually used in in Sage.

> For the former, if easy_install "just works," than it's not worth us

> creating and maintaining a separate spkg, but for the latter, we

> should ship it.

>

> The fact that an upstream package use nose in its tests did not seem

> like enough of a justification to create a whole new spkg, but if we

> want to write Sage tests with nose than I have no objection. I

> certainly think that there's a diminishing return on doctests once you

> reach a certain point (which we're probably not at yet).

check on a number of spkgs. Like Numpy and Scipy. So if nose were

available in the spkg framework, this would be nice.

That said, maybe 'easy_install' is really as easy as ./sage -i nose

from the internet, in which case I suppose one could have an spkg-

check that relied on the internet... but that wouldn't be ideal, I

think.

- kcrisman

Dec 2, 2010, 1:46:47 PM12/2/10

to sage-...@googlegroups.com

On 12/2/10 12:42 PM, kcrisman wrote:

> That said, maybe 'easy_install' is really as easy as ./sage -i nose

> from the internet, in which case I suppose one could have an spkg-

> check that relied on the internet... but that wouldn't be ideal, I

> think.

But that would also prevent yet another spkg to maintain. We have a

hard enough time keeping up with spkg updates as it is.

As Robert says, if we're using nose in Sage, that's a different story.

Thanks,

Jason

Dec 2, 2010, 2:03:34 PM12/2/10

to sage-devel

On Dec 2, 10:20 am, Robert Bradshaw <rober...@math.washington.edu>

wrote:

a devious doctest writer (close cousin to the devious reviewer) and

try to write a doctest that links seemingly disparate parts of Sage in

complicated ways expressed by a theorem. For example, automorphism

groups of graphs sometimes have connections with eigenvalues of the

adjacency matrices of the graph. If something breaks in either part

of Sage, then such a test may expose it. And sometimes these tests

are very succinct since they can be constructed so the output is

simply "True." And properly written, they make for interesting

reading.

Rob

wrote:

> On the topic of verifying tests, I think internal consistency checks

> are much better, both pedagogically and for verifiability, than

> external checks against other (perhaps inaccessible) systems. For

> example, the statement above that checks a power series against its

> definition and properties, or (since you brought up the idea of

> factorial) factorial(10) == prod([1..10]), or taking the derivative to

> verify an integral. Especially in more advanced math there are so many

> wonderful connections, both theorems and conjectures, that can be

> verified with a good test. For example, computing all the BSD

> invariants of an elliptic curve and verifying that the BSD formula

> holds is a strong indicator that the invariants were computed

> correctly via their various algorithms.

A huge +1 to this. Couldn't have said it better. I sometimes become
> are much better, both pedagogically and for verifiability, than

> external checks against other (perhaps inaccessible) systems. For

> example, the statement above that checks a power series against its

> definition and properties, or (since you brought up the idea of

> factorial) factorial(10) == prod([1..10]), or taking the derivative to

> verify an integral. Especially in more advanced math there are so many

> wonderful connections, both theorems and conjectures, that can be

> verified with a good test. For example, computing all the BSD

> invariants of an elliptic curve and verifying that the BSD formula

> holds is a strong indicator that the invariants were computed

> correctly via their various algorithms.

a devious doctest writer (close cousin to the devious reviewer) and

try to write a doctest that links seemingly disparate parts of Sage in

complicated ways expressed by a theorem. For example, automorphism

groups of graphs sometimes have connections with eigenvalues of the

adjacency matrices of the graph. If something breaks in either part

of Sage, then such a test may expose it. And sometimes these tests

are very succinct since they can be constructed so the output is

simply "True." And properly written, they make for interesting

reading.

Rob

Dec 2, 2010, 3:46:37 PM12/2/10

to sage-devel

anything for those packages if the user doesn't already have nose

installed? I just don't know what the consensus is on whether the

"batteries included" philosophy extends to something (SAGE_CHECK) that

the average user and even developer may not use.

- kcrisman

Dec 2, 2010, 3:47:34 PM12/2/10

to sage-devel

To follow up my own thing, maybe it would be possible to write a spkg-

check that tries to detect nose, exits gracefully if it's not there,

and otherwise uses a system nose... though of course then one would be

using the system Python... wouldn't one?

- kcrisman

check that tries to detect nose, exits gracefully if it's not there,

and otherwise uses a system nose... though of course then one would be

using the system Python... wouldn't one?

- kcrisman

Dec 2, 2010, 9:40:46 PM12/2/10

to sage-...@googlegroups.com

On 2 December 2010 18:20, Robert Bradshaw <robe...@math.washington.edu> wrote:

> On the topic of verifying tests, I think internal consistency checks

> are much better, both pedagogically and for verifiability, than

> external checks against other (perhaps inaccessible) systems. For

> example, the statement above that checks a power series against its

> definition and properties, or (since you brought up the idea of

> factorial) factorial(10) == prod([1..10]), or taking the derivative to

> verify an integral.

Of course I can see logic in this, especially when the software may

not be available. Even though it has limitations, and those

limitations might increase with time, Wolfram Alpha is currently

available to everyone. (It helps if you know Mathematica, as you can

input Mathematica syntax directly).

* The person writing the mathematical code is usually the same person

who writes the test for that code. Any assumptions they make which are

incorrect may exist in both the algorithm and the test code. Of

course one hopes the referee picks this up, but the referee process,

while useful, is not perfect.

* The example you give with 10 factorial and prod([1..10], would

probably use a fair amount of common code - such as MPIR.

* Differentiate(Integrate(f)) = f, in practice for many functions

doing this in Sage does not lead back to the same expression, although

they are mathematically equivalent. Converting to a numerical form

can sometimes be used to show results are equal, but even two

equivalent, but non-identical numerical results often exist.

(I wrote some Sage code which generated "random" functions and

applied the integrate/differentiate method. If you get a complex

result back after the differentiation step, it is not easy to

determine if it's the same as you started with.).

Some, though not all of the above can be eliminated by using software

that is developed totally independently.. Of course, even using

Wolfram Alpha will use some code common to Sage since:

a) Wolfram Alpha uses Mathematica

b) Mathematica uses GMP & ATLAS

c) Sage uses MPIR (derrived from GMP) and ATLAS.

I suspect there is other common code too, but they are two I'm aware of.

> Especially in more advanced math there are so many

> wonderful connections, both theorems and conjectures, that can be

> verified with a good test. For example, computing all the BSD

> invariants of an elliptic curve and verifying that the BSD formula

> holds is a strong indicator that the invariants were computed

> correctly via their various algorithms.

I'll accept what you say!

It's clear you have the ability to write decent tests, but I think its

fair to say there are a lot of Sage developers who have less knowledge

of this subject than you.

As such, I believe independant verification using other software is

useful. Someone remarked earlier it is common in the commercial world

to compare your results to that of competitive products.

> - Robert

Dave

Dec 3, 2010, 3:15:21 AM12/3/10

to sage-devel

> On the topic of verifying tests, I think internal consistency checks

> are much better, both pedagogically and for verifiability, than

> external checks against other (perhaps inaccessible) systems. For

> example, the statement above that checks a power series against its

> definition and properties, or (since you brought up the idea of

> factorial) factorial(10) == prod([1..10]), or taking the derivative to

> verify an integral. Especially in more advanced math there are so many

> wonderful connections, both theorems and conjectures, that can be

> verified with a good test. For example, computing all the BSD

> invariants of an elliptic curve and verifying that the BSD formula

> holds is a strong indicator that the invariants were computed

> correctly via their various algorithms.

>

> - Robert

Also a huge +1 from me. This is something I have been thinking a lot
> are much better, both pedagogically and for verifiability, than

> external checks against other (perhaps inaccessible) systems. For

> example, the statement above that checks a power series against its

> definition and properties, or (since you brought up the idea of

> factorial) factorial(10) == prod([1..10]), or taking the derivative to

> verify an integral. Especially in more advanced math there are so many

> wonderful connections, both theorems and conjectures, that can be

> verified with a good test. For example, computing all the BSD

> invariants of an elliptic curve and verifying that the BSD formula

> holds is a strong indicator that the invariants were computed

> correctly via their various algorithms.

>

> - Robert

about how to utilise most elegantly, and I think one could take it a

step further than doctests. I often myself write "parameterised

tests": tests for properties of the output of functions based on

"random" input. For example, say I have a library of polynomials over

fields. Then a useful property to test is for any polynomials a,b to

satisfy

a*b == b*a

I could write a test to randomly generate 100 different pairs of

polynomials a,b to check with, over "random" fields. I know that some

people sometimes write such tests, and it is also suggested in the

Developer's Guide somewhere.

I love the Haskell test-suite QuickCheck, which allows one to write

such tests extremely declaratively and succinctly. Haskell is way cool

when it comes to types, so it provides an elegant way of specifying

how to randomly generate your input. Transfering this directly Python

or Sage can't be as elegant, but I have been working on a small python-

script -- basically an extension to unittest -- which could make it at

least easier to write these kinds of tests. It's not done yet and can

be improved in many ways, but I use it all the time on my code; it's

quite reassuring to have written a set of involved functions over

bivariate polynomials over fields and then check their internal

consistency with 100-degree polynomials over +1000 cardinality

fields :-D

My thought is that doctests are nice for educational purposes and

basic testing, but I myself like to test my code better while writing

it. I don't want to introduce more bureaucracy, so I don't suggest

that we should _require_ such tests, but it would be nice to have a

usual/standard way of writing such tests, if an author or reviewer

felt like it. More importantly, if it could be done in a systematic

way, all such tests could share the random generating functions: for

example, all functions working over any field would need a "generate a

random field"-function, and if there was a central place for these in

Sage, the most common structures would quickly be available, making

parameterised test writing even easier.

- Johan

Dec 3, 2010, 9:03:04 PM12/3/10

to sage-...@googlegroups.com

On Friday, December 3, 2010, Johan S. R. Nielsen

I think nosetest is a superb framework for writing such unittests,

which really do encourage a completely different kind of testing than

doctests.

> More importantly, if it could be done in a systematic

> way, all such tests could share the random generating functions: for

> example, all functions working over any field would need a "generate a

> random field"-function, and if there was a central place for these in

I wrote such a thing. See rings.tests or test or rando_ring (i am

sending from a cell phone).

> Sage, the most common structures would quickly be available, making

> parameterised test writing even easier.

>

> - Johan

>

Dec 4, 2010, 12:32:16 AM12/4/10

to sage-...@googlegroups.com

On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:

> On 2 December 2010 18:20, Robert Bradshaw <robe...@math.washington.edu> wrote:

>

>> On the topic of verifying tests, I think internal consistency checks

>> are much better, both pedagogically and for verifiability, than

>> external checks against other (perhaps inaccessible) systems. For

>> example, the statement above that checks a power series against its

>> definition and properties, or (since you brought up the idea of

>> factorial) factorial(10) == prod([1..10]), or taking the derivative to

>> verify an integral.

>

> Of course I can see logic in this, especially when the software may

> not be available. Even though it has limitations, and those

> limitations might increase with time, Wolfram Alpha is currently

> available to everyone. (It helps if you know Mathematica, as you can

> input Mathematica syntax directly).

>

> * The person writing the mathematical code is usually the same person

> who writes the test for that code. Any assumptions they make which are

> incorrect may exist in both the algorithm and the test code. Of

> course one hopes the referee picks this up, but the referee process,

> while useful, is not perfect.

>

> * The example you give with 10 factorial and prod([1..10], would

> probably use a fair amount of common code - such as MPIR.

> On 2 December 2010 18:20, Robert Bradshaw <robe...@math.washington.edu> wrote:

>

>> On the topic of verifying tests, I think internal consistency checks

>> are much better, both pedagogically and for verifiability, than

>> external checks against other (perhaps inaccessible) systems. For

>> example, the statement above that checks a power series against its

>> definition and properties, or (since you brought up the idea of

>> factorial) factorial(10) == prod([1..10]), or taking the derivative to

>> verify an integral.

>

> Of course I can see logic in this, especially when the software may

> not be available. Even though it has limitations, and those

> limitations might increase with time, Wolfram Alpha is currently

> available to everyone. (It helps if you know Mathematica, as you can

> input Mathematica syntax directly).

>

> * The person writing the mathematical code is usually the same person

> who writes the test for that code. Any assumptions they make which are

> incorrect may exist in both the algorithm and the test code. Of

> course one hopes the referee picks this up, but the referee process,

> while useful, is not perfect.

>

> * The example you give with 10 factorial and prod([1..10], would

> probably use a fair amount of common code - such as MPIR.

If you do

prod(range(1,11))

and compare that to "factorial(10)", I think it uses absolutely no

common code at all.

prod(range(1,11)) -- uses arithmetic with Python ints and the Sage

prod command (which Robert Bradshaw wrote from scratch in Cython).

factorial(10) -- calls a GMP function that is written in C, and

shares no code at all with Python.

-- William

>

> * Differentiate(Integrate(f)) = f, in practice for many functions

> doing this in Sage does not lead back to the same expression, although

> they are mathematically equivalent. Converting to a numerical form

> can sometimes be used to show results are equal, but even two

> equivalent, but non-identical numerical results often exist.

They have to be the same up to rounding errors, right, or it is a bug?

So numerically the absolute of the difference must be small.

>

> (I wrote some Sage code which generated "random" functions and

> applied the integrate/differentiate method. If you get a complex

> result back after the differentiation step, it is not easy to

> determine if it's the same as you started with.).

>

> Some, though not all of the above can be eliminated by using software

> that is developed totally independently.. Of course, even using

I don't see how checking differentiation or integration with

Mathematica would be any easier than doing the above. You still have

the problem of comparing two different symbolic expressions.

> Wolfram Alpha will use some code common to Sage since:

>

> a) Wolfram Alpha uses Mathematica

> b) Mathematica uses GMP & ATLAS

> c) Sage uses MPIR (derrived from GMP) and ATLAS.

>

> I suspect there is other common code too, but they are two I'm aware of.

I know of no code in common between Mathematica an Sage except GMP and

ATLAS. It would be very interesting to find out if there is any other

code in common. Does Mathematica use any other open source code at

all?

Note that as you point out above Sage uses MPIR whereas mathematica

uses GMP. These two libraries are _massively_ different at this point

-- probably sharing way less than 50% of their code, if that.

>> Especially in more advanced math there are so many

>> wonderful connections, both theorems and conjectures, that can be

>> verified with a good test. For example, computing all the BSD

>> invariants of an elliptic curve and verifying that the BSD formula

>> holds is a strong indicator that the invariants were computed

>> correctly via their various algorithms.

>

> I'll accept what you say!

>

> It's clear you have the ability to write decent tests, but I think its

> fair to say there are a lot of Sage developers who have less knowledge

> of this subject than you [=Bradshaw].

True. However, I think the general mathematical background of the

average Sage developer is fairly high. If you look down the second

column of

http://sagemath.org/development-map.html

you'll see many have Ph.D.'s in mathematics, and most of those who

don't are currently getting Ph.D.'s in math.

> As such, I believe independant verification using other software is

> useful. Someone remarked earlier it is common in the commercial world

> to compare your results to that of competitive products.

+1 -- it's definitely useful. Everyone should use it when possible

in some ways.

But consistency comparisons using all open source software when

possible are very useful indeed, since they are more maintainable

longterm.

-- William

>

>> - Robert

Dec 6, 2010, 11:01:09 AM12/6/10

to sage-...@googlegroups.com

On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:

> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:

> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:

>> It's clear you have the ability to write decent tests, but I think its

>> fair to say there are a lot of Sage developers who have less knowledge

>> of this subject than you [=Bradshaw].

>

> True. However, I think the general mathematical background of the

> average Sage developer is fairly high. If you look down the second

> column of

> http://sagemath.org/development-map.html

>

> you'll see many have Ph.D.'s in mathematics, and most of those who

> don't are currently getting Ph.D.'s in math.

This presupposes that people of fairly high mathematical knowledge are

good at writing software.

I'm yet to be convinced that having a PhD in maths, or studying for

one, makes you good at writing software tests. Unless those people

have studied the different sort of testing techniques available -

white box, black box, fuzz etc, then I fail to see how they can be in

a good position to write the tests.

It's fairly clear in the past that the "Expected" result from a test

is what someone happened to get on their computer, and they did not

appear to be aware that the same would not be true of other

processors.

Vladimir Bondarenko.has been very effective at finding bugs in

commercial maths software by use of various testing techniques, yet I

think I'm correct in saying Vladimir does not have a maths degree of

any soft.

>> As such, I believe independent verification using other software is

>> useful. Someone remarked earlier it is common in the commercial world

>> to compare your results to that of competitive products.

>

> +1 -- it's definitely useful. Everyone should use it when possible

> in some ways.

I'm still waiting to hear from Wolfram Research on the use of Wolfram

Alpha for this. Personally I don't think there's anything in the terms

of use of Wolfram Alpha stopping use of the software for this, but

someone (I forget who), did question whether it is within the terms of

use or not.

> But consistency comparisons using all open source software when

> possible are very useful indeed, since they are more maintainable

> longterm.

Yes.

Especially if Wolfram Research thought it would hurt their revenue

from Mathematica sales, they could very easy re-write the terms of use

to disallow the use of Wolfram Alpha to check other software.

> -- William

Dave

Dec 6, 2010, 2:15:36 PM12/6/10

to sage-...@googlegroups.com

On Mon, Dec 6, 2010 at 8:01 AM, David Kirkby <david....@onetel.net> wrote:

> On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:

>> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:

>

>>> It's clear you have the ability to write decent tests, but I think its

>>> fair to say there are a lot of Sage developers who have less knowledge

>>> of this subject than you [=Bradshaw].

>>

>> True. However, I think the general mathematical background of the

>> average Sage developer is fairly high. If you look down the second

>> column of

>> http://sagemath.org/development-map.html

>>

>> you'll see many have Ph.D.'s in mathematics, and most of those who

>> don't are currently getting Ph.D.'s in math.

>

> This presupposes that people of fairly high mathematical knowledge are

> good at writing software.

> On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:

>> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david....@onetel.net> wrote:

>

>>> It's clear you have the ability to write decent tests, but I think its

>>> fair to say there are a lot of Sage developers who have less knowledge

>>> of this subject than you [=Bradshaw].

>>

>> True. However, I think the general mathematical background of the

>> average Sage developer is fairly high. If you look down the second

>> column of

>> http://sagemath.org/development-map.html

>>

>> you'll see many have Ph.D.'s in mathematics, and most of those who

>> don't are currently getting Ph.D.'s in math.

>

> This presupposes that people of fairly high mathematical knowledge are

> good at writing software.

No, it's an observation that people of fairly high mathematical

knowledge are the ones actually writing software.

> I'm yet to be convinced that having a PhD in maths, or studying for

> one, makes you good at writing software tests. Unless those people

> have studied the different sort of testing techniques available -

> white box, black box, fuzz etc, then I fail to see how they can be in

> a good position to write the tests.

Because they understand what the code is trying to do, what results

should be expected, etc. If I told someone who was an expert in all

these (admittedly valuable) testing techniques to write some tests

that computed special values of L-functions of elliptic curves, how

would they do it? It's not like there's just a command in Mathematica

that can do this, and even if there were, who knows if they'd be able

to understand how to use it.

If I gave it to anyone with an understanding of elliptic curves,

they'd immediately pick a positive rank curve or two, and make sure

the value is very close to zero, then probably look up some special

values in the literature, etc. Or, say, the algorithm was to compute

heights of points. To someone without background, it would look like a

random function point -> floating point number, but to anyone in the

know they'd instantly write some tests to verify bi-linearity,

vanishing at torsion points, etc.

Of course, to achieve the ideal solution, you'd have someone with the

math and testing background and lots of time on their hands, or at

least have several different people with those skills involved.

> It's fairly clear in the past that the "Expected" result from a test

> is what someone happened to get on their computer, and they did not

> appear to be aware that the same would not be true of other

> processors.

Most of the time that's due to floating point irregularities, and then

there's an even smaller percentage of the time that it's due to an

actual bug that didn't show up in the formerly-used environments. In

both of these cases the test, as written, wasn't (IMHO) wrong. Not

that there haven't been a couple of really bad cases where bad results

have been encoded into doctests, which is the fault of both the author

and referee, but I'm glad that these are rare enough to be quite

notable when discovered.

> Vladimir Bondarenko.has been very effective at finding bugs in

> commercial maths software by use of various testing techniques, yet I

> think I'm correct in saying Vladimir does not have a maths degree of

> any soft.

I agree, people of all backgrounds can make significant contributions.

>>> As such, I believe independent verification using other software is

>>> useful. Someone remarked earlier it is common in the commercial world

>>> to compare your results to that of competitive products.

>>

>> +1 -- it's definitely useful. Everyone should use it when possible

>> in some ways.

>

> I'm still waiting to hear from Wolfram Research on the use of Wolfram

> Alpha for this. Personally I don't think there's anything in the terms

> of use of Wolfram Alpha stopping use of the software for this, but

> someone (I forget who), did question whether it is within the terms of

> use or not.

>

>> But consistency comparisons using all open source software when

>> possible are very useful indeed, since they are more maintainable

>> longterm.

>

> Yes.

>

> Especially if Wolfram Research thought it would hurt their revenue

> from Mathematica sales, they could very easy re-write the terms of use

> to disallow the use of Wolfram Alpha to check other software.

That would be a chilling statement indeed. "You're not allowed to

compare these results to those computed with open source software..."

Imagine the absurd consequences this would have on, e.g. results that

appear in publications.

- Robert

Dec 6, 2010, 9:52:47 PM12/6/10

to sage-devel

On Dec 6, 11:15 am, Robert Bradshaw <rober...@math.washington.edu>

wrote:

>[*snip*]

If the "numerical noise" issue in sage testing has been in controversy

for so long, why not replace all such failing doctests with a warning

(if triggered) promising to convert it to a sensible test (not

dependent on floating point order of operations in hardware or said

base conversion); and dispense with all the vitriol? (Not referring to

any particular persons' vitriol -- I'm an equal opportunity observer

of circumlocution.)

> there's an even smaller percentage of the time that it's due to an

> actual bug that didn't show up in the formerly-used environments. In

How does this help your side of the argument?

> both of these cases the test, as written, wasn't (IMHO) wrong. Not

Why? If the test is non-deterministic, then you can have false-

positives and false-negatives. What's a good argument for that if you

can avoid it? If there are counter-examples (test case scenarios)

that prove you must take a statistical approach, then that would be an

entirely different testing framework.

> I agree, people of all backgrounds can make significant contributions.

http://trac.sagemath.org/sage_trac/ticket/8336

Robert and I had a long off-list discussion on this round() bug. The

problem IMHO, is not sticking to an interface; the requested invariant

(that the same precision type be returned) is not possible in some

cases. In other words, the interface/invariants are wrong, not the

test.

Speaking in maths terms, if the relation fails the vertical line test

(and is therefore not a function); why on earth would you call it a

function?

wrote:

> On Mon, Dec 6, 2010 at 8:01 AM, David Kirkby <david.kir...@onetel.net> wrote:

> > On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:

> >> On Thu, Dec 2, 2010 at 6:40 PM, David Kirkby <david.kir...@onetel.net> wrote:
> > On 4 December 2010 05:32, William Stein <wst...@gmail.com> wrote:

>[*snip*]

> > It's fairly clear in the past that the "Expected" result from a test

> > is what someone happened to get on their computer, and they did not

> > appear to be aware that the same would not be true of other

> > processors.

>

> Most of the time that's due to floating point irregularities, and then

http://docs.python.org/release/2.6.6/tutorial/floatingpoint.html#representation-error
> > is what someone happened to get on their computer, and they did not

> > appear to be aware that the same would not be true of other

> > processors.

>

> Most of the time that's due to floating point irregularities, and then

If the "numerical noise" issue in sage testing has been in controversy

for so long, why not replace all such failing doctests with a warning

(if triggered) promising to convert it to a sensible test (not

dependent on floating point order of operations in hardware or said

base conversion); and dispense with all the vitriol? (Not referring to

any particular persons' vitriol -- I'm an equal opportunity observer

of circumlocution.)

> there's an even smaller percentage of the time that it's due to an

> actual bug that didn't show up in the formerly-used environments. In

> both of these cases the test, as written, wasn't (IMHO) wrong. Not

positives and false-negatives. What's a good argument for that if you

can avoid it? If there are counter-examples (test case scenarios)

that prove you must take a statistical approach, then that would be an

entirely different testing framework.

> I agree, people of all backgrounds can make significant contributions.

Robert and I had a long off-list discussion on this round() bug. The

problem IMHO, is not sticking to an interface; the requested invariant

(that the same precision type be returned) is not possible in some

cases. In other words, the interface/invariants are wrong, not the

test.

Speaking in maths terms, if the relation fails the vertical line test

(and is therefore not a function); why on earth would you call it a

function?

Dec 8, 2010, 1:28:38 PM12/8/10

to sage-devel

On Dec 6, 8:01 am, David Kirkby <david.kir...@onetel.net> wrote:

> This presupposes that people of fairly high mathematical knowledge are

> good at writing software.

>

> I'm yet to be convinced that having a PhD in maths, or studying for

> one, makes you good at writing software tests

Unless those people

> have studied the different sort of testing techniques available -

> white box, black box, fuzz etc, then I fail to see how they can be in

> a good position to write the tests.

courses

in software engineering and much of what is conveyed has almost no

applicability for scientific software testing.

>

> Vladimir Bondarenko.has been very effective at finding bugs in

> commercial maths software by use of various testing techniques, yet I

> think I'm correct in saying Vladimir does not have a maths degree of

> any soft.

bugs, and not just presenting gibberish and looking to see what comes

out.

VB's problems are in having a degree or not.

>

> I'm still waiting to hear from Wolfram Research on the use of Wolfram

> Alpha for this.

> Personally I don't think there's anything in the terms

> of use of Wolfram Alpha stopping use of the software for this, but

> someone (I forget who), did question whether it is within the terms of

> use or not.

OK.

You can just run Mathematica.

>

> > But consistency comparisons using all open source software when

> > possible are very useful indeed, since they are more maintainable

> > longterm.

>

> Yes.

really doubtful. Comparing 2 results that are equivalent but not
> > But consistency comparisons using all open source software when

> > possible are very useful indeed, since they are more maintainable

> > longterm.

>

> Yes.

identical

(simplified differently) is difficult sometimes. Using someone's open

source software is potentially a big waste of time. You have to

debug that too! Your statement is kind of ambiguous... do you mean

"all open source software" or "only open source software"? or

maybe "selected open source software" or ... excluding closed

source...

Whether open source is "more maintainable"

is also naive. If someone (else) maintains the open source, and it

changes,

what then? If the open source no longer runs, and you have to

"maintain" it,

what then? Compare this to "does Mathematica [say] give an answer

that

is consistent? " {still problematical, but at least you don't have

to "maintain" it.}

>

> Especially if Wolfram Research thought it would hurt their revenue

> from Mathematica sales, they could very easy re-write the terms of use

> to disallow the use of Wolfram Alpha to check other software.

2. Why would they waste their (lawyer's) time.

Dec 8, 2010, 1:35:15 PM12/8/10

to sage-devel

> I agree, people of all backgrounds can make significant contributions.

Logically, nothing to argue with
"There may be a person X of {no particular specified background} who

can make a significant contribution"

I think we agree that we have higher expectations for people with

particular backgrounds.

As for WRI's lawyers making "chilling statements".

Again.. why would they bother?

And why should anyone care? Do you think that Wolfram Alpha will last

longer than Mathematica?

Dec 8, 2010, 1:45:21 PM12/8/10

to sage-devel

> And why should anyone care? Do you think that Wolfram Alpha will last

> longer than Mathematica?

would have access to Mma, but that (for now) they would all have

access to W|A. Just to clarify - I don't really have a horse in this

race.

- kcrisman

Dec 8, 2010, 6:31:44 PM12/8/10