http://trac.sagemath.org/sage_trac/ticket/10187
where I raised an issue.
There was this test:
sage: taylor(gamma(1/3+x),x,0,3)
-1/432*((36*(pi*sqrt(3) + 9*log(3))*euler_gamma^2 + 27*pi^2*log(3) +
72*euler_gamma^3 + 243*log(3)^3 + 18*(6*pi*sqrt(3)*log(3) + pi^2 +
27*log(3)^2 + 12*psi(1, 1/3))*euler_gamma + 324*psi(1, 1/3)*log(3) +
(pi^3 + 9*(9*log(3)^2 + 4*psi(1, 1/3))*pi)*sqrt(3))*gamma(1/3) -
72*gamma(1/3)*psi(2, 1/3))*x^3 + 1/24*(6*pi*sqrt(3)*log(3) +
4*(pi*sqrt(3) + 9*log(3))*euler_gamma + pi^2 + 12*euler_gamma^2 +
27*log(3)^2 + 12*psi(1, 1/3))*x^2*gamma(1/3) - 1/6*(6*euler_gamma +
pi*sqrt(3) + 9*log(3))*x*gamma(1/3) + gamma(1/3)
sage: map(lambda f:f[0].n(), _.coeffs())
[2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]
I asked the author on the ticket that added the numerical coefficients
( [2.6789385347..., -8.3905259853..., 26.662447494...,
-80.683148377...]) to justify them, since I wanted to know they were
right before giving this a positive review. The author remarked he was
not the original author of the long analytic expression, but doubted
it had ever been checked. However, he did agree to check the numerical
results he had added. He did this using Maple 12 and got the same
answer as Sage.
In this case I'm satisfied the bit of code added to get the numerical
results is probably OK, as it has been independently verified by
another package.. The probability of them both being wrong is very
small, since they should be developed largely independent of each
other. Also the analytic express is probably OK.
I really feel people should use doctests where the analytic results
can be verified, or at least justified in some way. If the results are
then be expressed as numerical results, whenever possible those
numerical results should be independently verified, as was done on
this ticket after I requested verification.
Method of verification could include
* Results given in a decent book
* Results computed by programs like Mathematic and Maple.
* Showing results are similar to an approximate method.
For example, if a bit of code claims to compute prime_pi(n) exactly
with n=10000000000000000000000000000000000000000000000000000000000000000000000000000000000
then that would be difficult to verify by other means. Mathematica for
example can't do it, and I doubt there is any computer could do it in
my lifetime. [1]
But there are numerical approximation for prime_pi, so computing a
numerical approximation, and showing it's similar to the numerical
equivalent of what was computed would be a reasonable verification the
function is correct.
It seems to me that many of the doctest have as expected values that's
basically whatever someone got on their computer. Sometimes they have
the sense to realise that different floating point processors will
give different results, so they add a few dots so not every digit is
expected to be the same.
To me at least, tests where the results are totally unjustified are
very poor tests, yet they seem to be quite common.
I was reading the other day about how one of the large Mersenne primes
was verified. I can't be exact, but it was something like:
* Found by one person on his computer using an AMD or Intel CPU
* Checked by another person using a different program on an Intel or AMD CPU
* Checked by a third person, on a Sun M9000 using a SPARC processor.
I'm not expecting us to such lengths, but I feel expected values
should be justified.
Whenever we run tests on the Python package we get failures. If we run
the Maxima test suite, we get failures, which appear with ECL as the
Lisp interpreter, but not on some other interpreters. This indicates
to me we should not put too much trusts into tests which re not
justified.
Comments?
Dave
[1] An interesting experiment would be to find a proof that such a
number could not be computed before the Sun runs out of energy and so
all life on earth would be terminated. The designers of the 128-bit
file system used on Solaris have verified that the energy required to
fill the file system would be more than the energy required to boil
all the water in the oceans. I suspect similar arguments could be used
to prove one can't compute prime_pi(n) for sufficiently large n.
--
To post to this group, send an email to sage-...@googlegroups.com
To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org
I think we will have to agree to differ then.
> Of course, if we had an arbitrarily large amount of time to write doctests,
> then it would be a laudible goal. Even now, I think there are situations
> where it would be reasonable to ask this of the author of a patch: if there
> was some indication of inconsistency for example. And if someone wants to
> go through the Sage library adding such consistency checks, I think that's a
> great way to improve Sage.
So you admit it would improve stage to check the tests.
> But it's already difficult enough to get code
> refereed without adding a requirement that code have such consistency
> checks.
It would probably be a bit easier to convince reviewers if your
doctests can be verified.
> The doctests that you object to fill two important roles:
> 1) they provide an example to a reader of the documentation how to use the
> function.
Yes, perhaps a confusing one if the answer is wrong. An embarrassing
one if the examples are wrong.
> 2) they provide a check so that if some change to the Sage library breaks
> something, we find out when testing.
> Until we have 100% doctest coverage, I think that's plenty.
100% covered of unverified tests is not worth a lot to me. What do you
propose we do when we get 100% coverage - go back and check if the
rests are valid or not? What a waste of time that would be. It would
be less overall effort to do the tests correctly the first time.
If you are going to give an example, how much longer does it take to
check if they are consistent with Mathematica or similar software? Or
chose an integral from a book?
Dave
So you admit it would improve stage to check the tests.
If you are going to give an example, how much longer does it take to
check if they are consistent with Mathematica or similar software? Or
chose an integral from a book?
I agree with David Roe.
I also would like to encourage David Kirkby (or anybody else) to
independently test as many examples as they can, and if they uncover
any issues, open a ticket and post a patch. Also, if they are
refereeing new patches, do some testing of your own. I always do!
If anything, this independent checking should be the referee's job --
even if the author claimed to check things independently, the referee
would do well to double check some tests.
So David K., I hope you'll continue to "put your money where you
mouth" is and referee a lot of patches. You've done a massive amount
already. Keep up the good work.
But let's not make Sage too much more bureaucratic. If anything, it's
already too bureaucratic. I personally can hardly stand to submit
anything to Sage anymore because of this.
I do think it would be good to start using nosetest
(http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to
automatically run all functions that start with "test_" in all files,
in addition to doctests. This is how I've been testing the purple-sage
library (http://code.google.com/p/purplesage/), and for many cases it
does result in me writing much more comprehensive test suites.
Notetest is also very nice because it can run all the tests in a given
file in parallel. Also, when a test in a file fails, it can drop you
into a debugging shell right there with the failed test. This is
all something that we should start doing in addition to aiming for
100% doctest coverage for the sage library...
>> Of course, if we had an arbitrarily large amount of time to write doctests,
>> then it would be a laudible goal. Even now, I think there are situations
>> where it would be reasonable to ask this of the author of a patch: if there
>> was some indication of inconsistency for example. And if someone wants to
>> go through the Sage library adding such consistency checks, I think that's a
>> great way to improve Sage.
>
> So you admit it would improve sage to check the tests.
It's hard to deny.
>> But it's already difficult enough to get code
>> refereed without adding a requirement that code have such consistency
>> checks.
>
> It would probably be a bit easier to convince reviewers if your
> doctests can be verified.
When people review, they should try to verify tests however they want.
>> The doctests that you object to fill two important roles:
>> 1) they provide an example to a reader of the documentation how to use the
>> function.
>
> Yes, perhaps a confusing one if the answer is wrong. An embarrassing
> one if the examples are wrong.
>
>> 2) they provide a check so that if some change to the Sage library breaks
>> something, we find out when testing.
>
>> Until we have 100% doctest coverage, I think that's plenty.
>
> 100% covered of unverified tests is not worth a lot to me. What do you
> propose we do when we get 100% coverage - go back and check if the
> rests are valid or not? What a waste of time that would be.
Verifying correctness of tests is not a waste of time.
> It would be less overall effort to do the tests correctly the first time.
People presumably *think* they are doing tests correctly. The point
is that you're wanting authors to submit "proofs" that they did
independent verification of results, and I think that is too much
bureaucracy. But asking referees to check claimed examples --
that makes sense! In particular, if I referee some code, and it
turns out somebody finds that the examples were just wrong, then I as
the referee will be pretty embarrassed.
> If you are going to give an example, how much longer does it take to
> check if they are consistent with Mathematica or similar software? Or
> chose an integral from a book?
That does raise an issue: one problems is that most of Sage isn't
calculus. Most code I write these days isn't available in any other
software...
A lot of what Sage does is available only in Magma say, which many
people don't even have access to.
> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>
--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org
I thought you would.
> I also would like to encourage David Kirkby (or anybody else) to
> independently test as many examples as they can, and if they uncover
> any issues, open a ticket and post a patch.
For me personally, as a non-mathematician I'd have a problem with just
accepting a doctest, which I probably can't verify myself. In some
cases I can using Mathematica, and have done on some occasions. But in
the case of
http://trac.sagemath.org/sage_trac/ticket/10187
I could not. But in this case I'll trust the author when he says he
has verified this with Maple. I think Sage is better for that change.
> Also, if they are
> refereeing new patches, do some testing of your own. I always do!
> If anything, this independent checking should be the referee's job --
> even if the author claimed to check things independently, the referee
> would do well to double check some tests.
You have an advantage over me. Of course, I could decline to give a
positive review until a mathematician has said the patch is OK. That
would delay the ticket more of course.
> So David K., I hope you'll continue to "put your money where you
> mouth" is and referee a lot of patches. You've done a massive amount
> already. Keep up the good work.
But as I say, I'm restricted somewhat when people add tests I'm not
convinced of.
> But let's not make Sage too much more bureaucratic. If anything, it's
> already too bureaucratic. I personally can hardly stand to submit
> anything to Sage anymore because of this.
I realise you now have PSage. I feel its a shame you did not complete
the Cygwin port first, but that's your choice. I can understand your
reasons.
> I do think it would be good to start using nosetest
> (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/) to
> automatically run all functions that start with "test_" in all files,
I suggested 'nose' was added a long time ago
the only person to reply (Robert Bradshaw) disagreed.
>> It would probably be a bit easier to convince reviewers if your
>> doctests can be verified.
>
> When people review, they should try to verify tests however they want.
But one could make life a lot easier for a reviewer by picking
something when the results can be verified easily. If one writes a
test to show how to use function X, then the input probably does not
matter too much. So chose an input where the output can be verified,
rather than some input where it can't be.
>>> Until we have 100% doctest coverage, I think that's plenty.
>>
>> 100% covered of unverified tests is not worth a lot to me. What do you
>> propose we do when we get 100% coverage - go back and check if the
>> rests are valid or not? What a waste of time that would be.
>
> Verifying correctness of tests is not a waste of time.
I don't know what the current coverage is, but lets say for argument
it needs another 1000 tests to get 100% coverage. It's better to
verify those 1000 tests now, rather than wait to we get 100% coverage,
then go back and verify them.
>> It would be less overall effort to do the tests correctly the first time.
>
> People presumably *think* they are doing tests correctly. The point
> is that you're wanting authors to submit "proofs" that they did
> independent verification of results, and I think that is too much
> bureaucracy.
No, I'm not suggesting a formal proof. In the case of the patch here
http://trac.sagemath.org/sage_trac/attachment/ticket/10187/trac_10187_fix_easy_doctests.patch
lines 345 & 346 was added, as a test, with nothing to say why. The
author has now said Maple 12 gives the same answer - I believe him in
this case.
I rather suspect the input, which shows how to use the taylor
function, could be any of numerous inputs. The one chosen
sage: taylor(gamma(1/3+x),x,0,3)
gives a huge output which is going to be next to impossible to verify
analytically. I rather suspect using a different series, where the
output was well known, would have been more logical.
> But asking referees to check claimed examples --
> that makes sense! In particular, if I referee some code, and it
> turns out somebody finds that the examples were just wrong, then I as
> the referee will be pretty embarrassed.
Yes, but using examples like
sage: taylor(gamma(1/3+x),x,0,3)
makes it almost impossible for a referee to check it, as the output is huge.
>
>> If you are going to give an example, how much longer does it take to
>> check if they are consistent with Mathematica or similar software? Or
>> chose an integral from a book?
>
> That does raise an issue: one problems is that most of Sage isn't
> calculus. Most code I write these days isn't available in any other
> software...
> A lot of what Sage does is available only in Magma say, which many
> people don't even have access to.
Fair enough. But you can at least state the Sage output is consistent
with that from Magma.
In any case, you stated only a week or so ago that Magma 2.13 is now
installed on sage.math
http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en
It's a shame the license of Wolfram Alpha does not allow for testing
software like Sage. (This was debated some time ago on sage-devel).
Otherwise that would give a nice easy way to verify *some* results.
"is 100001 prime"
http://www.wolframalpha.com/input/?i=is+100001+prime
> William Stein
> Professor of Mathematics
> University of Washington
> http://wstein.org
I appreciate in many cases it's not going to be possible to verify by
other means. One has to be extra careful about the code then.
Dave
Yes. I especially agree with David Kirkby's remark: "IMHO it would be
sensible to have nose as a standard package.".
>
>> > If you are going to give an example, how much longer does it take to
>> > check if they are consistent with Mathematica or similar software? Or
>> > chose an integral from a book?
>>
>> That does raise an issue: one problems is that most of Sage isn't
>> calculus. Most code I write these days isn't available in any other
>> software...
>> A lot of what Sage does is available only in Magma say, which many
>> people don't even have access to.
>
> And plots need to be tested 'by hand' by looking at them - which I do
> a lot of when reviewing those tickets.
>
> One interesting point coming out of this is that the onus is put on
> the author, not the reviewer, for testing. I assume that means
> "running doctests with ./sage -t or something", not "trying edge/
> corner cases the author might not have thought of and making sure
> those work", which I think does properly belong with the reviewer.
I disagree. The author *and* the reviewer should both do as much as they can
reasonably do.
William
Well now that I know nose better, I agree with you. It's a really
awesome testing framework. I use it all the time for my own work now.
>> Verifying correctness of tests is not a waste of time.
>
> I don't know what the current coverage is, but lets say for argument
> it needs another 1000 tests to get 100% coverage. It's better to
> verify those 1000 tests now, rather than wait to we get 100% coverage,
> then go back and verify them.
Orthogonal to your remark, but in sage-4.6:
$ sage -coverageall
...
Overall weighted coverage score: 84.3%
Total number of functions: 26592
We need 173 more function to get to 85% coverage.
We need 1503 more function to get to 90% coverage.
We need 2833 more function to get to 95% coverage.
It's only 2,833 tests!
>> But asking referees to check claimed examples --
>> that makes sense! In particular, if I referee some code, and it
>> turns out somebody finds that the examples were just wrong, then I as
>> the referee will be pretty embarrassed.
>
> Yes, but using examples like
>
> sage: taylor(gamma(1/3+x),x,0,3)
>
> makes it almost impossible for a referee to check it, as the output is huge.
I totally agree, and I think that's a very valid criticism for you to
make as a referee.
But let's not make a new policy out of this.
> In any case, you stated only a week or so ago that Magma 2.13 is now
> installed on sage.math
>
> http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en
That is a post from 2006?!?
> It's a shame the license of Wolfram Alpha does not allow for testing
> software like Sage. (This was debated some time ago on sage-devel).
> Otherwise that would give a nice easy way to verify *some* results.
>
> "is 100001 prime"
>
> http://www.wolframalpha.com/input/?i=is+100001+prime
I'm not sure what you're talking about exactly at this point.
Referees can use wolfram alpha if they want to independently check
stuff... Do you mean adding doctests that call wolframalpha? That
would be weird.
-- William
>
>
>> William Stein
>> Professor of Mathematics
>> University of Washington
>> http://wstein.org
>
> I appreciate in many cases it's not going to be possible to verify by
> other means. One has to be extra careful about the code then.
>
> Dave
>
> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
>
--
It would seem sensible to make it standard in that case. Making it
optional seems a bit less useful to me.
>> Yes, but using examples like
>>
>> sage: taylor(gamma(1/3+x),x,0,3)
>>
>> makes it almost impossible for a referee to check it, as the output is huge.
>
> I totally agree, and I think that's a very valid criticism for you to
> make as a referee.
The code I was refereeing did *not* add
sage: taylor(gamma(1/3+x),x,0,3)
That was there before
What I queried was the doctest which converted the huge symbolic
result to a much simpler numerical result, which was added in the
ticket in question. (It was added, as the format of Maxima had
changed, so a test was added to see that that the numerical values
were the same, even if the symbolic ones were not).
sage: map(lambda f:f[0].n(), _.coeffs()) # numerical coefficients to
make comparison easier; Maple 12 gives same answer
[2.6789385347..., -8.3905259853..., 26.662447494..., -80.683148377...]
After I asked, the author verified it in Maple 12, as the doctest
notes. So that probably means Maxima has it right
> But let's not make a new policy out of this.
>
>
>> In any case, you stated only a week or so ago that Magma 2.13 is now
>> installed on sage.math
>>
>> http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en
>
> That is a post from 2006?!?
Em, I thought I see the post recently and Googled for it.
>> It's a shame the license of Wolfram Alpha does not allow for testing
>> software like Sage. (This was debated some time ago on sage-devel).
>> Otherwise that would give a nice easy way to verify *some* results.
>>
>> "is 100001 prime"
>>
>> http://www.wolframalpha.com/input/?i=is+100001+prime
>
> I'm not sure what you're talking about exactly at this point.
> Referees can use wolfram alpha if they want to independently check
> stuff...
Yes, verifying results is OK.
But storing comments in the source code of Sage containing a large
number of comparisons with Wolfram Alpha may not be. See
http://groups.google.com/group/sage-devel/msg/1f8af294fbf40ccc?hl=en
where Alex Ghitza pointed out this might breach the terms of use.
http://www.wolframalpha.com/termsofuse.html
> Do you mean adding doctests that call wolframalpha? That
> would be weird.
No, I was not thinking of that.
> -- William
Dave
I think there's a distinction between an spkg that people might find
useful to use with Sage, and an spkg that's actually used in in Sage.
For the former, if easy_install "just works," than it's not worth us
creating and maintaining a separate spkg, but for the latter, we
should ship it.
The fact that an upstream package use nose in its tests did not seem
like enough of a justification to create a whole new spkg, but if we
want to write Sage tests with nose than I have no objection. I
certainly think that there's a diminishing return on doctests once you
reach a certain point (which we're probably not at yet).
>>> But asking referees to check claimed examples --
>>> that makes sense! In particular, if I referee some code, and it
>>> turns out somebody finds that the examples were just wrong, then I as
>>> the referee will be pretty embarrassed.
>>
>> Yes, but using examples like
>>
>> sage: taylor(gamma(1/3+x),x,0,3)
>>
>> makes it almost impossible for a referee to check it, as the output is huge.
What would make a better test in this case would be taking the
resulting power series, perhaps to a higher degree of precision, and
evaluating at 0.1, 0.5, and showing that the result is close to
gamma(1/3 + 0.1), gamma(1/3 + 0.5). Or perhaps verifying that the 3rd
coefficient is equal to the 3rd derivative / 6.
> I totally agree, and I think that's a very valid criticism for you to
> make as a referee.
>
> But let's not make a new policy out of this.
+1. As more time is spent reading the code and tests rather than
applying patches, we should be more critical of good vs. bad tests.
This also goes with the ideal of making it really easy to edit a
patch, perhaps even online. (Imagine if you could run some code and
press a button to add that doctest to the library, pending refereeing
of course...)
>> In any case, you stated only a week or so ago that Magma 2.13 is now
>> installed on sage.math
>>
>> http://groups.google.com/group/sage-devel/msg/8e473e24b0e48772?hl=en
>
> That is a post from 2006?!?
>
>> It's a shame the license of Wolfram Alpha does not allow for testing
>> software like Sage. (This was debated some time ago on sage-devel).
>> Otherwise that would give a nice easy way to verify *some* results.
>>
>> "is 100001 prime"
>>
>> http://www.wolframalpha.com/input/?i=is+100001+prime
>
> I'm not sure what you're talking about exactly at this point.
> Referees can use wolfram alpha if they want to independently check
> stuff... Do you mean adding doctests that call wolframalpha? That
> would be weird.
>
>> I appreciate in many cases it's not going to be possible to verify by
>> other means. One has to be extra careful about the code then.
On the topic of verifying tests, I think internal consistency checks
are much better, both pedagogically and for verifiability, than
external checks against other (perhaps inaccessible) systems. For
example, the statement above that checks a power series against its
definition and properties, or (since you brought up the idea of
factorial) factorial(10) == prod([1..10]), or taking the derivative to
verify an integral. Especially in more advanced math there are so many
wonderful connections, both theorems and conjectures, that can be
verified with a good test. For example, computing all the BSD
invariants of an elliptic curve and verifying that the BSD formula
holds is a strong indicator that the invariants were computed
correctly via their various algorithms.
- Robert
> That said, maybe 'easy_install' is really as easy as ./sage -i nose
> from the internet, in which case I suppose one could have an spkg-
> check that relied on the internet... but that wouldn't be ideal, I
> think.
But that would also prevent yet another spkg to maintain. We have a
hard enough time keeping up with spkg updates as it is.
As Robert says, if we're using nose in Sage, that's a different story.
Thanks,
Jason
> On the topic of verifying tests, I think internal consistency checks
> are much better, both pedagogically and for verifiability, than
> external checks against other (perhaps inaccessible) systems. For
> example, the statement above that checks a power series against its
> definition and properties, or (since you brought up the idea of
> factorial) factorial(10) == prod([1..10]), or taking the derivative to
> verify an integral.
Of course I can see logic in this, especially when the software may
not be available. Even though it has limitations, and those
limitations might increase with time, Wolfram Alpha is currently
available to everyone. (It helps if you know Mathematica, as you can
input Mathematica syntax directly).
* The person writing the mathematical code is usually the same person
who writes the test for that code. Any assumptions they make which are
incorrect may exist in both the algorithm and the test code. Of
course one hopes the referee picks this up, but the referee process,
while useful, is not perfect.
* The example you give with 10 factorial and prod([1..10], would
probably use a fair amount of common code - such as MPIR.
* Differentiate(Integrate(f)) = f, in practice for many functions
doing this in Sage does not lead back to the same expression, although
they are mathematically equivalent. Converting to a numerical form
can sometimes be used to show results are equal, but even two
equivalent, but non-identical numerical results often exist.
(I wrote some Sage code which generated "random" functions and
applied the integrate/differentiate method. If you get a complex
result back after the differentiation step, it is not easy to
determine if it's the same as you started with.).
Some, though not all of the above can be eliminated by using software
that is developed totally independently.. Of course, even using
Wolfram Alpha will use some code common to Sage since:
a) Wolfram Alpha uses Mathematica
b) Mathematica uses GMP & ATLAS
c) Sage uses MPIR (derrived from GMP) and ATLAS.
I suspect there is other common code too, but they are two I'm aware of.
> Especially in more advanced math there are so many
> wonderful connections, both theorems and conjectures, that can be
> verified with a good test. For example, computing all the BSD
> invariants of an elliptic curve and verifying that the BSD formula
> holds is a strong indicator that the invariants were computed
> correctly via their various algorithms.
I'll accept what you say!
It's clear you have the ability to write decent tests, but I think its
fair to say there are a lot of Sage developers who have less knowledge
of this subject than you.
As such, I believe independant verification using other software is
useful. Someone remarked earlier it is common in the commercial world
to compare your results to that of competitive products.
> - Robert
Dave
I think nosetest is a superb framework for writing such unittests,
which really do encourage a completely different kind of testing than
doctests.
> More importantly, if it could be done in a systematic
> way, all such tests could share the random generating functions: for
> example, all functions working over any field would need a "generate a
> random field"-function, and if there was a central place for these in
I wrote such a thing. See rings.tests or test or rando_ring (i am
sending from a cell phone).
> Sage, the most common structures would quickly be available, making
> parameterised test writing even easier.
>
> - Johan
>
If you do
prod(range(1,11))
and compare that to "factorial(10)", I think it uses absolutely no
common code at all.
prod(range(1,11)) -- uses arithmetic with Python ints and the Sage
prod command (which Robert Bradshaw wrote from scratch in Cython).
factorial(10) -- calls a GMP function that is written in C, and
shares no code at all with Python.
-- William
>
> * Differentiate(Integrate(f)) = f, in practice for many functions
> doing this in Sage does not lead back to the same expression, although
> they are mathematically equivalent. Converting to a numerical form
> can sometimes be used to show results are equal, but even two
> equivalent, but non-identical numerical results often exist.
They have to be the same up to rounding errors, right, or it is a bug?
So numerically the absolute of the difference must be small.
>
> (I wrote some Sage code which generated "random" functions and
> applied the integrate/differentiate method. If you get a complex
> result back after the differentiation step, it is not easy to
> determine if it's the same as you started with.).
>
> Some, though not all of the above can be eliminated by using software
> that is developed totally independently.. Of course, even using
I don't see how checking differentiation or integration with
Mathematica would be any easier than doing the above. You still have
the problem of comparing two different symbolic expressions.
> Wolfram Alpha will use some code common to Sage since:
>
> a) Wolfram Alpha uses Mathematica
> b) Mathematica uses GMP & ATLAS
> c) Sage uses MPIR (derrived from GMP) and ATLAS.
>
> I suspect there is other common code too, but they are two I'm aware of.
I know of no code in common between Mathematica an Sage except GMP and
ATLAS. It would be very interesting to find out if there is any other
code in common. Does Mathematica use any other open source code at
all?
Note that as you point out above Sage uses MPIR whereas mathematica
uses GMP. These two libraries are _massively_ different at this point
-- probably sharing way less than 50% of their code, if that.
>> Especially in more advanced math there are so many
>> wonderful connections, both theorems and conjectures, that can be
>> verified with a good test. For example, computing all the BSD
>> invariants of an elliptic curve and verifying that the BSD formula
>> holds is a strong indicator that the invariants were computed
>> correctly via their various algorithms.
>
> I'll accept what you say!
>
> It's clear you have the ability to write decent tests, but I think its
> fair to say there are a lot of Sage developers who have less knowledge
> of this subject than you [=Bradshaw].
True. However, I think the general mathematical background of the
average Sage developer is fairly high. If you look down the second
column of
http://sagemath.org/development-map.html
you'll see many have Ph.D.'s in mathematics, and most of those who
don't are currently getting Ph.D.'s in math.
> As such, I believe independant verification using other software is
> useful. Someone remarked earlier it is common in the commercial world
> to compare your results to that of competitive products.
+1 -- it's definitely useful. Everyone should use it when possible
in some ways.
But consistency comparisons using all open source software when
possible are very useful indeed, since they are more maintainable
longterm.
-- William
>
>> - Robert
>> It's clear you have the ability to write decent tests, but I think its
>> fair to say there are a lot of Sage developers who have less knowledge
>> of this subject than you [=Bradshaw].
>
> True. However, I think the general mathematical background of the
> average Sage developer is fairly high. If you look down the second
> column of
> http://sagemath.org/development-map.html
>
> you'll see many have Ph.D.'s in mathematics, and most of those who
> don't are currently getting Ph.D.'s in math.
This presupposes that people of fairly high mathematical knowledge are
good at writing software.
I'm yet to be convinced that having a PhD in maths, or studying for
one, makes you good at writing software tests. Unless those people
have studied the different sort of testing techniques available -
white box, black box, fuzz etc, then I fail to see how they can be in
a good position to write the tests.
It's fairly clear in the past that the "Expected" result from a test
is what someone happened to get on their computer, and they did not
appear to be aware that the same would not be true of other
processors.
Vladimir Bondarenko.has been very effective at finding bugs in
commercial maths software by use of various testing techniques, yet I
think I'm correct in saying Vladimir does not have a maths degree of
any soft.
>> As such, I believe independent verification using other software is
>> useful. Someone remarked earlier it is common in the commercial world
>> to compare your results to that of competitive products.
>
> +1 -- it's definitely useful. Everyone should use it when possible
> in some ways.
I'm still waiting to hear from Wolfram Research on the use of Wolfram
Alpha for this. Personally I don't think there's anything in the terms
of use of Wolfram Alpha stopping use of the software for this, but
someone (I forget who), did question whether it is within the terms of
use or not.
> But consistency comparisons using all open source software when
> possible are very useful indeed, since they are more maintainable
> longterm.
Yes.
Especially if Wolfram Research thought it would hurt their revenue
from Mathematica sales, they could very easy re-write the terms of use
to disallow the use of Wolfram Alpha to check other software.
> -- William
Dave
No, it's an observation that people of fairly high mathematical
knowledge are the ones actually writing software.
> I'm yet to be convinced that having a PhD in maths, or studying for
> one, makes you good at writing software tests. Unless those people
> have studied the different sort of testing techniques available -
> white box, black box, fuzz etc, then I fail to see how they can be in
> a good position to write the tests.
Because they understand what the code is trying to do, what results
should be expected, etc. If I told someone who was an expert in all
these (admittedly valuable) testing techniques to write some tests
that computed special values of L-functions of elliptic curves, how
would they do it? It's not like there's just a command in Mathematica
that can do this, and even if there were, who knows if they'd be able
to understand how to use it.
If I gave it to anyone with an understanding of elliptic curves,
they'd immediately pick a positive rank curve or two, and make sure
the value is very close to zero, then probably look up some special
values in the literature, etc. Or, say, the algorithm was to compute
heights of points. To someone without background, it would look like a
random function point -> floating point number, but to anyone in the
know they'd instantly write some tests to verify bi-linearity,
vanishing at torsion points, etc.
Of course, to achieve the ideal solution, you'd have someone with the
math and testing background and lots of time on their hands, or at
least have several different people with those skills involved.
> It's fairly clear in the past that the "Expected" result from a test
> is what someone happened to get on their computer, and they did not
> appear to be aware that the same would not be true of other
> processors.
Most of the time that's due to floating point irregularities, and then
there's an even smaller percentage of the time that it's due to an
actual bug that didn't show up in the formerly-used environments. In
both of these cases the test, as written, wasn't (IMHO) wrong. Not
that there haven't been a couple of really bad cases where bad results
have been encoded into doctests, which is the fault of both the author
and referee, but I'm glad that these are rare enough to be quite
notable when discovered.
> Vladimir Bondarenko.has been very effective at finding bugs in
> commercial maths software by use of various testing techniques, yet I
> think I'm correct in saying Vladimir does not have a maths degree of
> any soft.
I agree, people of all backgrounds can make significant contributions.
>>> As such, I believe independent verification using other software is
>>> useful. Someone remarked earlier it is common in the commercial world
>>> to compare your results to that of competitive products.
>>
>> +1 -- it's definitely useful. Everyone should use it when possible
>> in some ways.
>
> I'm still waiting to hear from Wolfram Research on the use of Wolfram
> Alpha for this. Personally I don't think there's anything in the terms
> of use of Wolfram Alpha stopping use of the software for this, but
> someone (I forget who), did question whether it is within the terms of
> use or not.
>
>> But consistency comparisons using all open source software when
>> possible are very useful indeed, since they are more maintainable
>> longterm.
>
> Yes.
>
> Especially if Wolfram Research thought it would hurt their revenue
> from Mathematica sales, they could very easy re-write the terms of use
> to disallow the use of Wolfram Alpha to check other software.
That would be a chilling statement indeed. "You're not allowed to
compare these results to those computed with open source software..."
Imagine the absurd consequences this would have on, e.g. results that
appear in publications.
- Robert