I haven't done any testing yet, but in order to get './sage -ba' to
finish, I applied a seventh patch (attached), which I discovered with
hg diff -R /scratch/craigcitro/cy-work/fcubed/devel/sage
Thanks! Yes, you'd probably need that too.
- Robert
Below are some results from parallel doctests on sage.math. In each of
/mnt/usb1/scratch/mpatel/tmp/sage-4.4.4-cython
/mnt/usb1/scratch/mpatel/tmp/sage-4.5.1-cython
I have run (or am still running)
./tester | tee -a ztester &
where 'tester' contains
#!/bin/bash
ulimit -c unlimited
RUNS=20
for I in `seq 1 $RUNS`;
do
LOG="ptestlong-j20-$I.log"
if [ ! -f "$LOG" ]; then
echo "Run $I of $RUNS"
nice ./sage -tp 20 -long -sagenb devel/sage > "$LOG" 2>&1
# grep -A2 -B1 dumped "$LOG"
ls -lsFtr `find -type f -name core` | grep core | tee -a "$LOG"
# Rename each core to core_cy.$I
rm -f _ren
find -name core -type f | awk '{print "mv "$0" "$0"_cy.'${I}'"}'
> _ren
. _ren
fi
done
The log files and cores (renamed to core_cy.1, etc.) are still in/under
SAGE_ROOT.
I don't know if the results tell you more than you already know. For
example,
sage-4.5.1-cython$ for x in `\ls ptestlong-j20-*`; do grep "doctests
failed" $x; done | grep -v "0 doctests failed" | sort | uniq -c
1 sage -t -long devel/sage/sage/graphs/graph.py # 2
doctests failed
19 sage -t -long devel/sage/sage/tests/startup.py # 1
doctests failed
But
sage-4.5.1-cython$ find -name core_cy\* | sort
./data/extcode/genus2reduction/core_cy.1
./data/extcode/genus2reduction/core_cy.10
./data/extcode/genus2reduction/core_cy.11
./data/extcode/genus2reduction/core_cy.12
./data/extcode/genus2reduction/core_cy.13
./data/extcode/genus2reduction/core_cy.14
./data/extcode/genus2reduction/core_cy.15
./data/extcode/genus2reduction/core_cy.16
./data/extcode/genus2reduction/core_cy.17
./data/extcode/genus2reduction/core_cy.18
./data/extcode/genus2reduction/core_cy.19
./data/extcode/genus2reduction/core_cy.2
./data/extcode/genus2reduction/core_cy.20
./data/extcode/genus2reduction/core_cy.3
./data/extcode/genus2reduction/core_cy.4
./data/extcode/genus2reduction/core_cy.5
./data/extcode/genus2reduction/core_cy.6
./data/extcode/genus2reduction/core_cy.7
./data/extcode/genus2reduction/core_cy.8
./data/extcode/genus2reduction/core_cy.9
./devel/sage-main/doc/fr/tutorial/core_cy.17
./devel/sage-main/sage/algebras/core_cy.17
./devel/sage-main/sage/categories/core_cy.4
./devel/sage-main/sage/categories/core_cy.6
./devel/sage-main/sage/combinat/root_system/core_cy.12
./devel/sage-main/sage/databases/core_cy.1
./devel/sage-main/sage/databases/core_cy.18
./devel/sage-main/sage/ext/core_cy.18
./devel/sage-main/sage/groups/matrix_gps/core_cy.5
./devel/sage-main/sage/gsl/core_cy.4
./devel/sage-main/sage/misc/core_cy.10
./devel/sage-main/sage/misc/core_cy.17
./devel/sage-main/sage/misc/core_cy.2
./devel/sage-main/sage/modular/abvar/core_cy.7
./devel/sage-main/sage/plot/plot3d/core_cy.19
./devel/sage-main/sage/rings/core_cy.20
./local/lib/python2.6/site-packages/sagenb-0.8.1-py2.6.egg/sagenb/testing/tests/core_cy.19
Should I test differently?
So it looks like you're getting segfaults all over the place as
well... Hmm... Could you test with
https://sage.math.washington.edu:8091/hudson/job/sage-build/163/artifact/cython-devel.spkg
?
- Robert
With the new package, I get similar results, i.e., apparently random
segfaults. The core score is about the same:
$ find -name core_cy2.\* | sort
./data/extcode/genus2reduction/core_cy2.1
./data/extcode/genus2reduction/core_cy2.10
./data/extcode/genus2reduction/core_cy2.11
./data/extcode/genus2reduction/core_cy2.12
./data/extcode/genus2reduction/core_cy2.13
./data/extcode/genus2reduction/core_cy2.14
./data/extcode/genus2reduction/core_cy2.15
./data/extcode/genus2reduction/core_cy2.16
./data/extcode/genus2reduction/core_cy2.17
./data/extcode/genus2reduction/core_cy2.18
./data/extcode/genus2reduction/core_cy2.19
./data/extcode/genus2reduction/core_cy2.2
./data/extcode/genus2reduction/core_cy2.20
./data/extcode/genus2reduction/core_cy2.3
./data/extcode/genus2reduction/core_cy2.4
./data/extcode/genus2reduction/core_cy2.5
./data/extcode/genus2reduction/core_cy2.6
./data/extcode/genus2reduction/core_cy2.7
./data/extcode/genus2reduction/core_cy2.8
./data/extcode/genus2reduction/core_cy2.9
./devel/sage-main/doc/en/reference/core_cy2.20
./devel/sage-main/doc/en/reference/sagenb/misc/core_cy2.9
./devel/sage-main/sage/categories/core_cy2.3
./devel/sage-main/sage/combinat/core_cy2.13
./devel/sage-main/sage/combinat/matrices/core_cy2.16
./devel/sage-main/sage/crypto/core_cy2.17
./devel/sage-main/sage/interfaces/core_cy2.12
./devel/sage-main/sage/interfaces/core_cy2.9
./devel/sage-main/sage/matrix/core_cy2.16
./devel/sage-main/sage/misc/core_cy2.17
./devel/sage-main/sage/misc/core_cy2.2
./devel/sage-main/sage/misc/core_cy2.5
./devel/sage-main/sage/modular/abvar/core_cy2.20
./devel/sage-main/sage/rings/core_cy2.6
./devel/sage-main/sage/rings/polynomial/core_cy2.10
./local/lib/python2.6/site-packages/sagenb-0.8.1-py2.6.egg/sagenb/interfaces/core_cy2.5
(I've moved the earlier logs to SAGE_ROOT/oldlogs but left the
corresponding core_cy.* in place.)
By the way, in order to get 'sage -ba' to succeed for the 4.5.2 series,
I added an 8th patch (attached). I don't know if the changes are OK,
but in limited testing, the doctests pass, modulo the heisenbug.
Well, it's clear there's something going on. What about testing on a
plain-vanilla sage (with the old Cython)?
Yeah, it looks fine (and similar to some of the other stuff we had to do).
- Robert
I get similar results on sage.math with the released 4.5.3.alpha0:
$ cd /scratch/mpatel/tmp/cython/sage-4.5.3.alpha0-segs
$ find -name core_x\* -type f | wc
30 30 1315
$ grep egmentation ptestlong-j20-*log | wc
11 70 939
So the problem could indeed lie elsewhere, e.g., in the doctesting system.
I'll try to run some experiments with the attached make-based parallel
doctester.
With the alternate tester and vanilla 4.5.3.alpha0, I get "only" the 20
cores in
data/extcode/genus2reduction/
(I don't know yet how many times and with which file(s) this fault
happens during each long doctest run nor whether certain files
reproducibly trigger the fault.)
Moreover, the only failed doctests are in startup.py (19 times) and
decorate.py (once). The logs maketestlong-j20-* don't explicitly
mention segmentation faults.
So sage-ptest may be responsible, somehow, for the faults that leave
cores in apparently random directories and perhaps also for random test
failures.
The Cython beta, at least, may be off the hook. But I'll check with the
alternate tester.
Thanks for looking into this, another data point is really helpful. I
put a vanilla Sage in hudson and for a while it was passing all of its
tests every time, then all of the sudden it started failing too. Very
strange... For now I've resorted to starting up Sage in a loop (as the
segfault always happened during startup) and am seeing about a 0.5%
failure rate (which is the same that I see with a vanilla Sage).
Hopefully we can get the parallel testing to work much more reliably
so we can use it as a good indicator in our Cython build farm to keep
people from breaking Sage (and I'm honestly really surprised we
haven't run into these issues during release management as well...)
- Robert
I made 20 copies of 4.5.3.alpha0. In each copy, I ran the long doctest
suite serially with the alternate tester, which I modified to rename any
new cores after each test. All copies end up with a "stealth" core in
data/extcode/genus2reduction/
and point to
sage/interfaces/genus2reduction.py
as the only source of this core. I'll open a ticket.
>> Moreover, the only failed doctests are in startup.py (19 times) and
>> decorate.py (once). The logs maketestlong-j20-* don't explicitly
>> mention segmentation faults.
>>
>> So sage-ptest may be responsible, somehow, for the faults that leave
>> cores in apparently random directories and perhaps also for random test
>> failures.
Here's one problem: When we test
/path/to/foo.py
sage-doctest writes
SAGE_TESTDIR/.doctest_foo.py
runs the new file through 'python', and deletes it. This can cause
collisions when we test in parallel multiple files with the same
basename, e.g., __init__, all, misc, conf, constructor, morphism, index,
tests, homset, element, twist, tutorial, sagetex, crystals,
cartesian_product, template, ring, etc. (There's a similar problem with
testing non-library files, which sage-doctest first effectively copies
to SAGE_TESTDIR.) We could instead use
.doctest_path_to_foo.py
or
.doctest_path_to_foo_ABC123.py
where ABC123 is unique. With the latter we could run multiple
simultaneous tests of the same file. I'll open a ticket or maybe use an
existing one.
>> The Cython beta, at least, may be off the hook. But I'll check with the
>> alternate tester.
I get the same results with 4.5.3.alpha0 + the Cython beta (rev 3629).
> Thanks for looking into this, another data point is really helpful. I
> put a vanilla Sage in hudson and for a while it was passing all of its
> tests every time, then all of the sudden it started failing too. Very
> strange... For now I've resorted to starting up Sage in a loop (as the
> segfault always happened during startup) and am seeing about a 0.5%
> failure rate (which is the same that I see with a vanilla Sage).
> Hopefully we can get the parallel testing to work much more reliably
> so we can use it as a good indicator in our Cython build farm to keep
> people from breaking Sage (and I'm honestly really surprised we
> haven't run into these issues during release management as well...)
Strange, indeed.
Could the startup segfault be distinct from and not present among the
doctest faults? Testing about 2500 files with a 0.5% startup failure
rate would give us about 13 extra, random(?) faults per run, which we
don't see.
Could we use the tempfile module instead of using SAGE_TESTDIR. The
tempfile module makes files and directories by default that are unique
and are *designed* to live on a fast filesystem, which gets cleaned
regularly.
sage: import tempfile
William
This is now
I've opened
http://trac.sagemath.org/sage_trac/ticket/9739
specifically for this problem.
> Could we use the tempfile module instead of using SAGE_TESTDIR. The
> tempfile module makes files and directories by default that are unique
> and are *designed* to live on a fast filesystem, which gets cleaned
> regularly.
>
> sage: import tempfile
Sure. I've added a comment about this to #9739.
Or should that be 0.05%?
Some more data: I batch-tested 'sage -c "quit"' about 1.2e6 times with
vanilla 4.5.3.alpha1 on sage.math. Each run ended with exit status 0
and no cores or discernible errors.
Sorry, yes, 0.05%.
> Some more data: I batch-tested 'sage -c "quit"' about 1.2e6 times with
> vanilla 4.5.3.alpha1 on sage.math. Each run ended with exit status 0
> and no cores or discernible errors.
Nice. Not sure what changed, but sounds like something did.
- Robert
I don't know, but I've opened
http://trac.sagemath.org/sage_trac/ticket/9828
Soon. It could probably make it in the next release if the release
manager is willing to wait a week (probably less, don't have time to
clean up my patches and make an spkg tonight, but maybe tomorrow).
- Robert
Would it be wise to change Cython at the same time as Pari? I would
think that would add a whole new set of unknowns if there are
problems. The Pari ticket seems to be complex enough.
Dave