Help us test Cython?

4 views
Skip to first unread message

Craig Citro

unread,
Jul 30, 2010, 2:54:43 AM7/30/10
to sage-devel
Hi all,

So we're currently working on a long-overdue release of Cython with
all kinds of snazzy new features. However, our automated testing
system seems to keep turning up sporadic segfaults when running the
sage doctest suite. This is obviously bad, but we're having a hard
time reproducing this -- they seem to be *very* occasional failures
while starting up sage, and thus far the only consistent appearance
has been *within* our automated testing system (hudson). We've got a
pile of dumped cores, which have mostly led us to the conclusions that
(1) the problem occurs at a seemingly random point, so we should
suspect some sort of memory corruption, and (2) sage does a *whole*
lot of stuff when it starts up. ;)

So we'd love to see if other people see these same failures. Anyone
want to try out the new cython? You can grab all the files you need
here:

http://sage.math.washington.edu/home/craigcitro/cython-0.13-beta/

There's a new spkg and 6 patches against the sage library. You can add
the patches, sage -i the spkg, and then do a sage -ba, and voila! you
should have a sage running the bleeding edge cython. (If that doesn't
build, it means I forgot some patch somewhere -- there's a working
sage-4.4.4 with the new cython in /scratch/craigcitro/cy-work/fcubed
on sage.math if anyone wants to root around.)

After that, run the full test suite as many times as you're willing,
hopefully with and without parallel doctesting (i.e. sage -tp). Then
let us know what you turn up -- lots of random failures, or does
everything pass? Points for machines we can ssh into and generated
core files (ulimit -c unlimited), and even more points for anyone
seeing consistent/repeatable failures. I'd also be very interested of
reports that you've run the test suite N times with no failures.

-cc

Mitesh Patel

unread,
Jul 30, 2010, 4:44:44 AM7/30/10
to sage-...@googlegroups.com
On 07/30/2010 01:54 AM, Craig Citro wrote:
> So we're currently working on a long-overdue release of Cython with
> all kinds of snazzy new features. However, our automated testing
> system seems to keep turning up sporadic segfaults when running the
> sage doctest suite. This is obviously bad, but we're having a hard
> time reproducing this -- they seem to be *very* occasional failures
> while starting up sage, and thus far the only consistent appearance
> has been *within* our automated testing system (hudson). We've got a
> pile of dumped cores, which have mostly led us to the conclusions that
> (1) the problem occurs at a seemingly random point, so we should
> suspect some sort of memory corruption, and (2) sage does a *whole*
> lot of stuff when it starts up. ;)
>
> So we'd love to see if other people see these same failures. Anyone
> want to try out the new cython? You can grab all the files you need
> here:
>
> http://sage.math.washington.edu/home/craigcitro/cython-0.13-beta/
>
> There's a new spkg and 6 patches against the sage library. You can add
> the patches, sage -i the spkg, and then do a sage -ba, and voila! you
> should have a sage running the bleeding edge cython. (If that doesn't
> build, it means I forgot some patch somewhere -- there's a working
> sage-4.4.4 with the new cython in /scratch/craigcitro/cy-work/fcubed
> on sage.math if anyone wants to root around.)

I haven't done any testing yet, but in order to get './sage -ba' to
finish, I applied a seventh patch (attached), which I discovered with

hg diff -R /scratch/craigcitro/cy-work/fcubed/devel/sage

7_setup.patch

Robert Bradshaw

unread,
Jul 30, 2010, 12:33:37 PM7/30/10
to sage-...@googlegroups.com

Thanks! Yes, you'd probably need that too.

- Robert

Mitesh Patel

unread,
Jul 31, 2010, 5:51:28 AM7/31/10
to sage-...@googlegroups.com
On 07/30/2010 01:54 AM, Craig Citro wrote:
> So we're currently working on a long-overdue release of Cython with
> all kinds of snazzy new features. However, our automated testing
> system seems to keep turning up sporadic segfaults when running the
> sage doctest suite. This is obviously bad, but we're having a hard
> time reproducing this -- they seem to be *very* occasional failures
> while starting up sage, and thus far the only consistent appearance
> has been *within* our automated testing system (hudson). We've got a
> pile of dumped cores, which have mostly led us to the conclusions that
> (1) the problem occurs at a seemingly random point, so we should
> suspect some sort of memory corruption, and (2) sage does a *whole*
> lot of stuff when it starts up. ;)
> [...]

> After that, run the full test suite as many times as you're willing,
> hopefully with and without parallel doctesting (i.e. sage -tp). Then
> let us know what you turn up -- lots of random failures, or does
> everything pass? Points for machines we can ssh into and generated
> core files (ulimit -c unlimited), and even more points for anyone
> seeing consistent/repeatable failures. I'd also be very interested of
> reports that you've run the test suite N times with no failures.

Below are some results from parallel doctests on sage.math. In each of

/mnt/usb1/scratch/mpatel/tmp/sage-4.4.4-cython
/mnt/usb1/scratch/mpatel/tmp/sage-4.5.1-cython

I have run (or am still running)

./tester | tee -a ztester &

where 'tester' contains

#!/bin/bash
ulimit -c unlimited

RUNS=20
for I in `seq 1 $RUNS`;
do
LOG="ptestlong-j20-$I.log"
if [ ! -f "$LOG" ]; then
echo "Run $I of $RUNS"
nice ./sage -tp 20 -long -sagenb devel/sage > "$LOG" 2>&1

# grep -A2 -B1 dumped "$LOG"
ls -lsFtr `find -type f -name core` | grep core | tee -a "$LOG"

# Rename each core to core_cy.$I
rm -f _ren
find -name core -type f | awk '{print "mv "$0" "$0"_cy.'${I}'"}'
> _ren
. _ren
fi
done

The log files and cores (renamed to core_cy.1, etc.) are still in/under
SAGE_ROOT.

I don't know if the results tell you more than you already know. For
example,

sage-4.5.1-cython$ for x in `\ls ptestlong-j20-*`; do grep "doctests
failed" $x; done | grep -v "0 doctests failed" | sort | uniq -c
1 sage -t -long devel/sage/sage/graphs/graph.py # 2
doctests failed
19 sage -t -long devel/sage/sage/tests/startup.py # 1
doctests failed

But

sage-4.5.1-cython$ find -name core_cy\* | sort
./data/extcode/genus2reduction/core_cy.1
./data/extcode/genus2reduction/core_cy.10
./data/extcode/genus2reduction/core_cy.11
./data/extcode/genus2reduction/core_cy.12
./data/extcode/genus2reduction/core_cy.13
./data/extcode/genus2reduction/core_cy.14
./data/extcode/genus2reduction/core_cy.15
./data/extcode/genus2reduction/core_cy.16
./data/extcode/genus2reduction/core_cy.17
./data/extcode/genus2reduction/core_cy.18
./data/extcode/genus2reduction/core_cy.19
./data/extcode/genus2reduction/core_cy.2
./data/extcode/genus2reduction/core_cy.20
./data/extcode/genus2reduction/core_cy.3
./data/extcode/genus2reduction/core_cy.4
./data/extcode/genus2reduction/core_cy.5
./data/extcode/genus2reduction/core_cy.6
./data/extcode/genus2reduction/core_cy.7
./data/extcode/genus2reduction/core_cy.8
./data/extcode/genus2reduction/core_cy.9
./devel/sage-main/doc/fr/tutorial/core_cy.17
./devel/sage-main/sage/algebras/core_cy.17
./devel/sage-main/sage/categories/core_cy.4
./devel/sage-main/sage/categories/core_cy.6
./devel/sage-main/sage/combinat/root_system/core_cy.12
./devel/sage-main/sage/databases/core_cy.1
./devel/sage-main/sage/databases/core_cy.18
./devel/sage-main/sage/ext/core_cy.18
./devel/sage-main/sage/groups/matrix_gps/core_cy.5
./devel/sage-main/sage/gsl/core_cy.4
./devel/sage-main/sage/misc/core_cy.10
./devel/sage-main/sage/misc/core_cy.17
./devel/sage-main/sage/misc/core_cy.2
./devel/sage-main/sage/modular/abvar/core_cy.7
./devel/sage-main/sage/plot/plot3d/core_cy.19
./devel/sage-main/sage/rings/core_cy.20
./local/lib/python2.6/site-packages/sagenb-0.8.1-py2.6.egg/sagenb/testing/tests/core_cy.19

Should I test differently?

Robert Bradshaw

unread,
Aug 4, 2010, 4:10:02 AM8/4/10
to sage-...@googlegroups.com

So it looks like you're getting segfaults all over the place as
well... Hmm... Could you test with
https://sage.math.washington.edu:8091/hudson/job/sage-build/163/artifact/cython-devel.spkg
?

- Robert

Mitesh Patel

unread,
Aug 5, 2010, 7:26:39 AM8/5/10
to sage-...@googlegroups.com

With the new package, I get similar results, i.e., apparently random
segfaults. The core score is about the same:

$ find -name core_cy2.\* | sort
./data/extcode/genus2reduction/core_cy2.1
./data/extcode/genus2reduction/core_cy2.10
./data/extcode/genus2reduction/core_cy2.11
./data/extcode/genus2reduction/core_cy2.12
./data/extcode/genus2reduction/core_cy2.13
./data/extcode/genus2reduction/core_cy2.14
./data/extcode/genus2reduction/core_cy2.15
./data/extcode/genus2reduction/core_cy2.16
./data/extcode/genus2reduction/core_cy2.17
./data/extcode/genus2reduction/core_cy2.18
./data/extcode/genus2reduction/core_cy2.19
./data/extcode/genus2reduction/core_cy2.2
./data/extcode/genus2reduction/core_cy2.20
./data/extcode/genus2reduction/core_cy2.3
./data/extcode/genus2reduction/core_cy2.4
./data/extcode/genus2reduction/core_cy2.5
./data/extcode/genus2reduction/core_cy2.6
./data/extcode/genus2reduction/core_cy2.7
./data/extcode/genus2reduction/core_cy2.8
./data/extcode/genus2reduction/core_cy2.9
./devel/sage-main/doc/en/reference/core_cy2.20
./devel/sage-main/doc/en/reference/sagenb/misc/core_cy2.9
./devel/sage-main/sage/categories/core_cy2.3
./devel/sage-main/sage/combinat/core_cy2.13
./devel/sage-main/sage/combinat/matrices/core_cy2.16
./devel/sage-main/sage/crypto/core_cy2.17
./devel/sage-main/sage/interfaces/core_cy2.12
./devel/sage-main/sage/interfaces/core_cy2.9
./devel/sage-main/sage/matrix/core_cy2.16
./devel/sage-main/sage/misc/core_cy2.17
./devel/sage-main/sage/misc/core_cy2.2
./devel/sage-main/sage/misc/core_cy2.5
./devel/sage-main/sage/modular/abvar/core_cy2.20
./devel/sage-main/sage/rings/core_cy2.6
./devel/sage-main/sage/rings/polynomial/core_cy2.10
./local/lib/python2.6/site-packages/sagenb-0.8.1-py2.6.egg/sagenb/interfaces/core_cy2.5

(I've moved the earlier logs to SAGE_ROOT/oldlogs but left the
corresponding core_cy.* in place.)

By the way, in order to get 'sage -ba' to succeed for the 4.5.2 series,
I added an 8th patch (attached). I don't know if the changes are OK,
but in limited testing, the doctests pass, modulo the heisenbug.

8_more_typing_issues.patch

Robert Bradshaw

unread,
Aug 6, 2010, 12:12:42 AM8/6/10
to sage-...@googlegroups.com

Well, it's clear there's something going on. What about testing on a
plain-vanilla sage (with the old Cython)?

Yeah, it looks fine (and similar to some of the other stuff we had to do).

- Robert

Mitesh Patel

unread,
Aug 11, 2010, 4:25:09 AM8/11/10
to sage-...@googlegroups.com
[...]

>>>> Should I test differently?
>>>
>>> So it looks like you're getting segfaults all over the place as
>>> well... Hmm... Could you test with
>>> https://sage.math.washington.edu:8091/hudson/job/sage-build/163/artifact/cython-devel.spkg
>>> ?
>>
>> With the new package, I get similar results, i.e., apparently random
>> segfaults. The core score is about the same:
>
> Well, it's clear there's something going on. What about testing on a
> plain-vanilla sage (with the old Cython)?

I get similar results on sage.math with the released 4.5.3.alpha0:

$ cd /scratch/mpatel/tmp/cython/sage-4.5.3.alpha0-segs
$ find -name core_x\* -type f | wc
30 30 1315
$ grep egmentation ptestlong-j20-*log | wc
11 70 939

So the problem could indeed lie elsewhere, e.g., in the doctesting system.

I'll try to run some experiments with the attached make-based parallel
doctester.

Makefile.doctest

Mitesh Patel

unread,
Aug 11, 2010, 7:15:56 PM8/11/10
to sage-...@googlegroups.com

With the alternate tester and vanilla 4.5.3.alpha0, I get "only" the 20
cores in

data/extcode/genus2reduction/

(I don't know yet how many times and with which file(s) this fault
happens during each long doctest run nor whether certain files
reproducibly trigger the fault.)

Moreover, the only failed doctests are in startup.py (19 times) and
decorate.py (once). The logs maketestlong-j20-* don't explicitly
mention segmentation faults.

So sage-ptest may be responsible, somehow, for the faults that leave
cores in apparently random directories and perhaps also for random test
failures.

The Cython beta, at least, may be off the hook. But I'll check with the
alternate tester.

Robert Bradshaw

unread,
Aug 11, 2010, 8:19:31 PM8/11/10
to sage-...@googlegroups.com

Thanks for looking into this, another data point is really helpful. I
put a vanilla Sage in hudson and for a while it was passing all of its
tests every time, then all of the sudden it started failing too. Very
strange... For now I've resorted to starting up Sage in a loop (as the
segfault always happened during startup) and am seeing about a 0.5%
failure rate (which is the same that I see with a vanilla Sage).
Hopefully we can get the parallel testing to work much more reliably
so we can use it as a good indicator in our Cython build farm to keep
people from breaking Sage (and I'm honestly really surprised we
haven't run into these issues during release management as well...)

- Robert

Mitesh Patel

unread,
Aug 12, 2010, 6:01:48 PM8/12/10
to sage-...@googlegroups.com


I made 20 copies of 4.5.3.alpha0. In each copy, I ran the long doctest
suite serially with the alternate tester, which I modified to rename any
new cores after each test. All copies end up with a "stealth" core in

data/extcode/genus2reduction/

and point to

sage/interfaces/genus2reduction.py

as the only source of this core. I'll open a ticket.


>> Moreover, the only failed doctests are in startup.py (19 times) and
>> decorate.py (once). The logs maketestlong-j20-* don't explicitly
>> mention segmentation faults.
>>
>> So sage-ptest may be responsible, somehow, for the faults that leave
>> cores in apparently random directories and perhaps also for random test
>> failures.


Here's one problem: When we test

/path/to/foo.py

sage-doctest writes

SAGE_TESTDIR/.doctest_foo.py

runs the new file through 'python', and deletes it. This can cause
collisions when we test in parallel multiple files with the same
basename, e.g., __init__, all, misc, conf, constructor, morphism, index,
tests, homset, element, twist, tutorial, sagetex, crystals,
cartesian_product, template, ring, etc. (There's a similar problem with
testing non-library files, which sage-doctest first effectively copies
to SAGE_TESTDIR.) We could instead use

.doctest_path_to_foo.py

or

.doctest_path_to_foo_ABC123.py

where ABC123 is unique. With the latter we could run multiple
simultaneous tests of the same file. I'll open a ticket or maybe use an
existing one.


>> The Cython beta, at least, may be off the hook. But I'll check with the
>> alternate tester.


I get the same results with 4.5.3.alpha0 + the Cython beta (rev 3629).


> Thanks for looking into this, another data point is really helpful. I
> put a vanilla Sage in hudson and for a while it was passing all of its
> tests every time, then all of the sudden it started failing too. Very
> strange... For now I've resorted to starting up Sage in a loop (as the
> segfault always happened during startup) and am seeing about a 0.5%
> failure rate (which is the same that I see with a vanilla Sage).
> Hopefully we can get the parallel testing to work much more reliably
> so we can use it as a good indicator in our Cython build farm to keep
> people from breaking Sage (and I'm honestly really surprised we
> haven't run into these issues during release management as well...)

Strange, indeed.

Could the startup segfault be distinct from and not present among the
doctest faults? Testing about 2500 files with a 0.5% startup failure
rate would give us about 13 extra, random(?) faults per run, which we
don't see.

William Stein

unread,
Aug 12, 2010, 6:18:42 PM8/12/10
to sage-devel

Could we use the tempfile module instead of using SAGE_TESTDIR. The
tempfile module makes files and directories by default that are unique
and are *designed* to live on a fast filesystem, which gets cleaned
regularly.

sage: import tempfile

William

Mitesh Patel

unread,
Aug 12, 2010, 9:08:12 PM8/12/10
to sage-...@googlegroups.com
On 08/12/2010 05:18 PM, William Stein wrote:
> On Thu, Aug 12, 2010 at 3:01 PM, Mitesh Patel <qed...@gmail.com> wrote:
>> I made 20 copies of 4.5.3.alpha0. In each copy, I ran the long doctest
>> suite serially with the alternate tester, which I modified to rename any
>> new cores after each test. All copies end up with a "stealth" core in
>>
>> data/extcode/genus2reduction/
>>
>> and point to
>>
>> sage/interfaces/genus2reduction.py
>>
>> as the only source of this core. I'll open a ticket.

This is now

http://trac.sagemath.org/sage_trac/ticket/9738

Mitesh Patel

unread,
Aug 12, 2010, 9:35:49 PM8/12/10
to sage-...@googlegroups.com
William Stein or I wrote:
>>>> So sage-ptest may be responsible, somehow, for the faults that leave
>>>> cores in apparently random directories and perhaps also for random test
>>>> failures.
>>
>> Here's one problem: When we test
>>
>> /path/to/foo.py
>>
>> sage-doctest writes
>>
>> SAGE_TESTDIR/.doctest_foo.py
>>
>> runs the new file through 'python', and deletes it. This can cause
>> collisions when we test in parallel multiple files with the same
>> basename, e.g., __init__, all, misc, conf, constructor, morphism, index,
>> tests, homset, element, twist, tutorial, sagetex, crystals,
>> cartesian_product, template, ring, etc. (There's a similar problem with
>> testing non-library files, which sage-doctest first effectively copies
>> to SAGE_TESTDIR.) We could instead use
>>
>> .doctest_path_to_foo.py
>>
>> or
>>
>> .doctest_path_to_foo_ABC123.py
>>
>> where ABC123 is unique. With the latter we could run multiple
>> simultaneous tests of the same file. I'll open a ticket or maybe use an
>> existing one.

I've opened

http://trac.sagemath.org/sage_trac/ticket/9739

specifically for this problem.

> Could we use the tempfile module instead of using SAGE_TESTDIR. The
> tempfile module makes files and directories by default that are unique
> and are *designed* to live on a fast filesystem, which gets cleaned
> regularly.
>
> sage: import tempfile

Sure. I've added a comment about this to #9739.

Mitesh Patel

unread,
Aug 21, 2010, 9:17:20 PM8/21/10
to sage-...@googlegroups.com

Or should that be 0.05%?

Some more data: I batch-tested 'sage -c "quit"' about 1.2e6 times with
vanilla 4.5.3.alpha1 on sage.math. Each run ended with exit status 0
and no cores or discernible errors.

Robert Bradshaw

unread,
Aug 23, 2010, 12:03:19 PM8/23/10
to sage-...@googlegroups.com
On Sat, Aug 21, 2010 at 6:17 PM, Mitesh Patel <qed...@gmail.com> wrote:
> On 08/12/2010 05:01 PM, Mitesh Patel wrote:
>> On 08/11/2010 07:19 PM, Robert Bradshaw wrote:
>>> Thanks for looking into this, another data point is really helpful. I
>>> put a vanilla Sage in hudson and for a while it was passing all of its
>>> tests every time, then all of the sudden it started failing too. Very
>>> strange... For now I've resorted to starting up Sage in a loop (as the
>>> segfault always happened during startup) and am seeing about a 0.5%
>>> failure rate (which is the same that I see with a vanilla Sage).
>>> Hopefully we can get the parallel testing to work much more reliably
>>> so we can use it as a good indicator in our Cython build farm to keep
>>> people from breaking Sage (and I'm honestly really surprised we
>>> haven't run into these issues during release management as well...)
>>
>> Strange, indeed.
>>
>> Could the startup segfault be distinct from and not present among the
>> doctest faults?  Testing about 2500 files with a 0.5% startup failure
>> rate would give us about 13 extra, random(?) faults per run, which we
>> don't see.
>
> Or should that be 0.05%?

Sorry, yes, 0.05%.

> Some more data:  I batch-tested 'sage -c "quit"' about 1.2e6 times with
> vanilla 4.5.3.alpha1 on sage.math.  Each run ended with exit status 0
> and no cores or discernible errors.

Nice. Not sure what changed, but sounds like something did.

- Robert

Ryan Hinton

unread,
Aug 27, 2010, 6:44:40 PM8/27/10
to sage-devel
Any guesses when the snazzy new Cython 0.13 will end up in Sage? I
currently have some snazzy C++ code that I want to wrap in Cython
(uses namespaces, etc.), and the new features will make it much
easier.

Thanks!

- Ryan

Mitesh Patel

unread,
Aug 27, 2010, 8:37:18 PM8/27/10
to sage-...@googlegroups.com
On 08/27/2010 05:44 PM, Ryan Hinton wrote:
> Any guesses when the snazzy new Cython 0.13 will end up in Sage? I

I don't know, but I've opened

http://trac.sagemath.org/sage_trac/ticket/9828

Robert Bradshaw

unread,
Aug 27, 2010, 11:15:07 PM8/27/10
to sage-...@googlegroups.com

Soon. It could probably make it in the next release if the release
manager is willing to wait a week (probably less, don't have time to
clean up my patches and make an spkg tonight, but maybe tomorrow).

- Robert

David Kirkby

unread,
Aug 27, 2010, 11:54:38 PM8/27/10
to sage-...@googlegroups.com

Would it be wise to change Cython at the same time as Pari? I would
think that would add a whole new set of unknowns if there are
problems. The Pari ticket seems to be complex enough.

Dave

Reply all
Reply to author
Forward
0 new messages