Consistent failure of zn_poly in latest Sage

78 views
Skip to first unread message

Kwankyu Lee

unread,
May 6, 2013, 9:59:39 PM5/6/13
to sage-...@googlegroups.com
Hi,

I am experiencing consistent failure in building the latest Sage including just released 5.9 at zn_poly. My system is MacPro Quad-Core Intel Xeon with OS X 10.7.5 and latest Xcode (I think). Are there others experiencing the same?

Here is the log: 

Now running zn_poly's self-tuning program...
zn_poly tuning program
(use -v flag for verbose output)

Calibrating cycle counter... ok (2.26e+09)
mpn smp kara: done
mpn mulmid fallback: done
   KS1/2/4 mul: ...............................................................
   KS1/2/4 sqr: ...............................................................
KS1/2/4 mulmid: ...............................................................
      nuss mul: ...............................................................
      nuss sqr: ...............................................................
    KS/FFT mul: ...............................................................
    KS/FFT sqr: ...............................................................
 KS/FFT mulmid: ...............................................................

Now building zn_poly with its tuning parameters...
gcc -O3 -g  -fPIC  -I/Users/Kwankyu/sage/sage-5.9/local/include -I./include -DNDEBUG -o src/tuning.o -c src/tuning.c
ar -r libzn_poly.a src/array.o src/invert.o src/ks_support.o src/mulmid.o src/mulmid_ks.o src/misc.o src/mpn_mulmid.o src/mul.o src/mul_fft.o src/mul_fft_dft.o src/mul_ks.o src/nuss.o src/pack.o src/pmf.o src/pmfvec_fft.o src/tuning.o src/zn_mod.o
ar: creating archive libzn_poly.a
ranlib libzn_poly.a

Now building and running zn_poly's quick self-test...
gcc -g -O3 -g  -fPIC  -I/Users/Kwankyu/sage/sage-5.9/local/include -I./include -DDEBUG -o src/array-DEBUG.o -c src/array.c
...
test/test -quick all
mpn_smp_basecase()... ok
mpn_smp_kara()... ok
mpn_smp()... ok
mpn_mulmid()... ok
zn_array_recover_reduce()... ok
zn_array_pack()... ok
zn_array_unpack()... ok
zn_array_mul_KS1()... ok
zn_array_mul_KS2()... ok
zn_array_mul_KS3()... ok
zn_array_mul_KS4()... ok
zn_array_sqr_KS1()... ok
zn_array_sqr_KS2()... ok
zn_array_sqr_KS3()... ok
zn_array_sqr_KS4()... ok
zn_array_mulmid_KS1()... ok
zn_array_mulmid_KS2()... ok
zn_array_mulmid_KS3()... ok
zn_array_mulmid_KS4()... ok
nuss_mul()... FAIL!

At least one test FAILED!
make[3]: *** [check] Error 1
Error running zn_poly's quick test suite ('make check').

real 1m2.800s
user 1m0.384s
sys 0m1.477s

John H Palmieri

unread,
May 6, 2013, 10:14:47 PM5/6/13
to sage-...@googlegroups.com


On Monday, May 6, 2013 6:59:39 PM UTC-7, Kwankyu Lee wrote:
Hi,

I am experiencing consistent failure in building the latest Sage including just released 5.9 at zn_poly. My system is MacPro Quad-Core Intel Xeon with OS X 10.7.5 and latest Xcode (I think). Are there others experiencing the same?

You already know that there is a trac ticket for this (#13947) since you posted on it 6 days ago, so you know that others are experiencing it.

--
John

William Stein

unread,
May 6, 2013, 10:16:04 PM5/6/13
to sage-...@googlegroups.com
You are such a mathematician :-)

I'm guessing he means others beyond the ones mentioned in the trac ticket?

William

>
> --
> John
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-devel?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

kcrisman

unread,
May 6, 2013, 10:32:29 PM5/6/13
to sage-...@googlegroups.com


On Monday, May 6, 2013 10:16:04 PM UTC-4, William wrote:
On Mon, May 6, 2013 at 7:14 PM, John H Palmieri <jhpalm...@gmail.com> wrote:
>
>
> On Monday, May 6, 2013 6:59:39 PM UTC-7, Kwankyu Lee wrote:
>>
>> Hi,
>>
>> I am experiencing consistent failure in building the latest Sage including
>> just released 5.9 at zn_poly. My system is MacPro Quad-Core Intel Xeon with
>> OS X 10.7.5 and latest Xcode (I think). Are there others experiencing the
>> same?
>
>
> You already know that there is a trac ticket for this (#13947) since you
> posted on it 6 days ago, so you know that others are experiencing it.

You are such a mathematician :-)


Like the joke about the engineer, the physicist, and the mathematician in a fire-prone hotel room... "A solution exists" and goes back to bed.

John H Palmieri

unread,
May 6, 2013, 11:24:19 PM5/6/13
to sage-...@googlegroups.com


On Monday, May 6, 2013 7:16:04 PM UTC-7, William wrote:
On Mon, May 6, 2013 at 7:14 PM, John H Palmieri <jhpalm...@gmail.com> wrote:
>
>
> On Monday, May 6, 2013 6:59:39 PM UTC-7, Kwankyu Lee wrote:
>>
>> Hi,
>>
>> I am experiencing consistent failure in building the latest Sage including
>> just released 5.9 at zn_poly. My system is MacPro Quad-Core Intel Xeon with
>> OS X 10.7.5 and latest Xcode (I think). Are there others experiencing the
>> same?
>
>
> You already know that there is a trac ticket for this (#13947) since you
> posted on it 6 days ago, so you know that others are experiencing it.

You are such a mathematician :-)

I'm guessing he means others beyond the ones mentioned in the trac ticket?

Well, I'm not sure what he means. Searching the various sage-* google groups also reveals some people with the same problem. So I don't know what question he's really asking...

--
John

Jeroen Demeyer

unread,
May 7, 2013, 2:48:26 AM5/7/13
to sage-...@googlegroups.com
On 2013-05-07 03:59, Kwankyu Lee wrote:
> Are there others experiencing the same
Sure, lots of people, mostly (or only?) on recent OS X systems.

Kwankyu Lee

unread,
May 7, 2013, 9:00:59 PM5/7/13
to sage-...@googlegroups.com
Hi,

Sorry that I was not clear. Reading the comments for the Trac ticket, I thought that for others, this happens intermittently, so they succeed in building Sage eventually after a couple of failures. For my case, it always fails. I was asking if others experience the same consistent failures.


Kwankyu

kcrisman

unread,
May 7, 2013, 9:27:35 PM5/7/13
to sage-...@googlegroups.com


On Tuesday, May 7, 2013 9:00:59 PM UTC-4, Kwankyu Lee wrote:
Hi,

Sorry that I was not clear. Reading the comments for the Trac ticket, I thought that for others, this happens intermittently, so they succeed in building Sage eventually after a couple of failures. For my case, it always fails. I was asking if others experience the same consistent failures.


I think it is intermittent in the sense that on computers with the problem, it fails consistently when building Sage in parallel; the fix is to then quickly do that one package in series, then remain doing things in parallel (see the ticket or other discussions for details).  Are you saying that even when doing ./sage -i zn_poly (so, one thread) it fails? 

Kwankyu Lee

unread,
May 7, 2013, 9:49:13 PM5/7/13
to sage-...@googlegroups.com

I think it is intermittent in the sense that on computers with the problem, it fails consistently when building Sage in parallel; the fix is to then quickly do that one package in series, then remain doing things in parallel (see the ticket or other discussions for details).  Are you saying that even when doing ./sage -i zn_poly (so, one thread) it fails? 

I just did "./sage -i zn_poly" three times in a row. They all failed.

kcrisman

unread,
May 7, 2013, 10:09:18 PM5/7/13
to sage-...@googlegroups.com


On Tuesday, May 7, 2013 9:49:13 PM UTC-4, Kwankyu Lee wrote:

I think it is intermittent in the sense that on computers with the problem, it fails consistently when building Sage in parallel; the fix is to then quickly do that one package in series, then remain doing things in parallel (see the ticket or other discussions for details).  Are you saying that even when doing ./sage -i zn_poly (so, one thread) it fails? 

I just did "./sage -i zn_poly" three times in a row. They all failed.

And there is nothing else particularly heavy running on the computer?  (This appears to be the root cause of the failure, which parallel building happens to expose since then there is heavy stuff running by definition.)

John H Palmieri

unread,
May 7, 2013, 11:50:39 PM5/7/13
to sage-...@googlegroups.com

For example, can you stop everything on the computer and remote log in to it from another machine, so you're hardly running anything on it? If not, quit all possible applications and try building again. By the way, how much RAM does this machine have (this may not be relevant, but just in case)?

--
John

Kwankyu Lee

unread,
May 8, 2013, 12:54:37 AM5/8/13
to sage-...@googlegroups.com
For example, can you stop everything on the computer and remote log in to it from another machine, so you're hardly running anything on it? If not, quit all possible applications and try building again. By the way, how much RAM does this machine have (this may not be relevant, but just in case)?

There were running two virtual machines. So I rebooted the computer, and opened just a terminal, and in the pristine state issued the command "./sage -i zn_poly". To my surprise, it still fails. It has 16GB ram. For your information,

Host system:
Darwin Athena.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

and gcc version 4.7.2 (GCC)


Kwankyu

Tom Roby

unread,
May 8, 2013, 11:57:29 AM5/8/13
to sage-...@googlegroups.com
As I communicated a couple of weeks ago, I hit this same error on my desktop OS X 10.6.8 system.  It's now happened consistently, even when I unload my system, with both Sage5.8 and 5.9rc1.  So far, no luck installing sage on box. 

On the other hand, 5.8 compiled fine on my laptop last week, another OS X 10.6.8 system with somewhat newer specs.  I attributed it to the presence of Nicolas Thiery & Anne Schilling in the room during most of the compile, but perhaps there are some other places where I should be looking for differences in my systems to explain this? 

Tom

John H Palmieri

unread,
May 8, 2013, 8:02:36 PM5/8/13
to sage-...@googlegroups.com

Can you now compile earlier versions of Sage successfully? (That is, is the problem a recent change in Sage, or a recent change on your system?) If you can compile earlier versions now, what happens if you take one of those and run "./sage -i path/to/latest/zn_poly...spkg"?

--
John

Francois Bissey

unread,
May 8, 2013, 8:10:26 PM5/8/13
to sage-...@googlegroups.com
Apart from what John asks, can you try compiling zn_poly with a
different compiler. I assume your 4.7.2 is the compiler produced by
sage. If you have the option of using/installing a different gcc it
would be interesting.

Francois

leif

unread,
May 8, 2013, 10:16:14 PM5/8/13
to sage-...@googlegroups.com
Tom Roby wrote:
> 5.8 compiled fine on my laptop last week, another OS
> X 10.6.8 system with somewhat newer specs. I attributed it to the
> presence of Nicolas Thiery & Anne Schilling in the room during most of
> the compile [...]

Oh, that's something I haven't thought of yet. Is it reproducible?

Although for an according patch to zn_poly's spkg-install, they'd
probably have to /log in/ (to the local machine), as not every MacOS X
user will have a device for detecting their sole presence.

We should also try whether an /image/ of them (in the same room,
alternatively exposed to a webcam connected to the machine zn_poly is to
be built on) suffices.


-leif

--
() The ASCII Ribbon Campaign
/\ Help Cure HTML E-Mail

leif

unread,
May 8, 2013, 10:27:05 PM5/8/13
to sage-...@googlegroups.com
Kwankyu Lee wrote:
> For example, can you stop everything on the computer and remote log
> in to it from another machine, so you're hardly running anything on
> it? If not, quit all possible applications and try building again.
> By the way, how much RAM does this machine have (this may not be
> relevant, but just in case)?
>
>
> There were running two virtual machines. So I rebooted the computer, and
> opened just a terminal, and in the pristine state issued the command
> "./sage -i zn_poly". To my surprise, it still fails. It has 16GB ram.

Well, that still doesn't imply the machine isn't loaded otherwise.


-leif

P.S.: I strongly doubt GCC's version matters here.


> For your information,
>
> Host system:
> Darwin Athena.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23
> 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
>
> and gcc version 4.7.2 (GCC)
>
>
> Kwankyu

Francois Bissey

unread,
May 8, 2013, 10:33:58 PM5/8/13
to sage-...@googlegroups.com
On 09/05/13 14:27, leif wrote:
> Kwankyu Lee wrote:
>> For example, can you stop everything on the computer and remote log
>> in to it from another machine, so you're hardly running anything on
>> it? If not, quit all possible applications and try building again.
>> By the way, how much RAM does this machine have (this may not be
>> relevant, but just in case)?
>>
>>
>> There were running two virtual machines. So I rebooted the computer, and
>> opened just a terminal, and in the pristine state issued the command
>> "./sage -i zn_poly". To my surprise, it still fails. It has 16GB ram.
>
> Well, that still doesn't imply the machine isn't loaded otherwise.
>
>
> -leif
>
> P.S.: I strongly doubt GCC's version matters here.
>

I f I were you I wouldn't be o sure. I became involved in that bug
when I stumbled on a similar one porting to power7
http://trac.sagemath.org/sage_trac/ticket/14098
the version of gcc on that platform had an impact. The main
problem here is that zn_poly tuning parameters are completely
wrong. zn_poly is abandoned upstream and the tunning routines
are probably not aging well to support new cpu, compilers and OS.

Francois

William Stein

unread,
May 8, 2013, 11:46:02 PM5/8/13
to sage-...@googlegroups.com
It might not exactly be totally abandoned upstream (it's more "done"):
Here's what the author (David Harvey) wrote to me offlist earlier
today about this thread: "Hi, Not obvious to me what the problem is
from reading the thread. Probably a bug in zn_poly. I'm about to
leave on a camping trip.... I will take a look next week."

>
> Francois

Jean-Pierre Flori

unread,
May 9, 2013, 4:47:32 AM5/9/13
to sage-...@googlegroups.com
I think we came to the conclusion that some algorithms (KS or FFT, do not really remember) can segfault when they are used in some ranges where they were not expected to be used (let's say the FFT for small degree polys with small coeffs, although I'm not saying that is the problem).
And it seems that in some cases (loaded system for example), the tuning process gets completely wrong results and tries such combination (i.e. FFT for really small stuff and bang!).
(If KS is the problem, it might also be the case that such FFT/small integers combination is not supported in MPIR rather than some buggy combination in zn_poly itself.)

IIRC such problematic tuning parameters, which lead to systematic segfaults during the test suite of zn_poly have been collected on the Sage's trac ticket in cas David reads this thread.

leif

unread,
May 9, 2013, 7:14:17 AM5/9/13
to sage-...@googlegroups.com
Francois Bissey wrote:
> On 09/05/13 14:27, leif wrote:
>> P.S.: I strongly doubt GCC's version matters here.
>>
>
> If I were you I wouldn't be so sure. I became involved in that bug
> when I stumbled on a similar one porting to power7
> http://trac.sagemath.org/sage_trac/ticket/14098
> the version of gcc on that platform had an impact.

Well, I tried a couple of GCC versions and different compiler flags,
always with exactly the same result (see below), on x86_64 that is.


> The main
> problem here is that zn_poly tuning parameters are completely
> wrong.

At least in this thread (as opposed to the ticket's title, or to some
comments there), we're not talking about segfaults.

/Comparisons/ of zn_poly's results to those of a reference
implementation fail -- deterministically -- if you only "tweak" the
tuning parameters appropriately, and btw. only when *squaring* (but not
in all of these cases IIRC) using Nussbaumer multiplication, apparently
just because(?) then test_nuss_mul() lets the inputs alias. [As
mentioned on the ticket, the test failures vanish for me if I only
remove the argument aliasing.]

"Random" values from, or segfaults (I couldn't reproduce) in zn_poly's
tuning are a separate issue, and apparently indeed OS-dependent (MacOS
X, Cygwin).

Anyway, "random" (or unreasonable/unexpected/invalid) tuning parameters
should cause assertions to fail, which are enabled at least when
zn_poly's test suite is built. (The whole library is [re]built with and
without assertions enabled.)


-leif

P.S.: I had put a link to #13947 on a (IIRC hardly related) ticket
David Harvey was involved in, asking for help, so I assumed he was aware
of the former.

leif

unread,
May 24, 2013, 3:44:44 AM5/24/13
to sage-...@googlegroups.com
William Stein wrote:
> On Wed, May 8, 2013 at 7:33 PM, Francois Bissey
> <francoi...@canterbury.ac.nz> wrote:
>> The problem here is that zn_poly tuning parameters are completely
>> wrong. zn_poly is abandoned upstream and the tunning routines
>> are probably not aging well to support new cpu, compilers and OS.
>
> It might not exactly be totally abandoned upstream (it's more "done"):
> Here's what the author (David Harvey) wrote to me offlist earlier
> today about this thread: "Hi, Not obvious to me what the problem is
> from reading the thread. Probably a bug in zn_poly. I'm about to
> leave on a camping trip.... I will take a look next week."


David Harvey has come up with a trivial patch to the *test* code, and
I've created a preliminary spkg with that patch just for testing whether
it fixes the issue for you all.

See the ticket [1] for details and for a link to that spkg.


-leif

[1] http://trac.sagemath.org/sage_trac/ticket/13947#comment:32 ff.
Reply all
Reply to author
Forward
0 new messages