Re: Review request (issue with MPN_ZERO on t2)

7 views
Skip to first unread message

Bill Hart

unread,
Jul 17, 2009, 11:37:39 AM7/17/09
to Tom Boothby, Dr. David Kirkby, mpir-dev
This sage ticket 6453 is too hard. It actually looks like a "bug" in MPIR to me.

The only way to test David Kirkby's fix on T2 it is to do sage -sh and
then run the spkg-install, etc.

That means I'd need my own version of sage built on that machine,
otherwise the spkg-install fails due to permissions problems. From
what I recall, Sage takes many hours to build on that machine. Well
that is if it did actually build in the first place with that
compiler.

I'm not the right person to be reviewing this.

I'm also completely unsure whether the patch is valid or not. David
has done things like turn testing on which I've no idea is the right
thing to do or not, from a Sage point of view. That should be bundled
as a separate patch in my view.

Besides that the issue seems to be a compiler bug, which occurs from
gcc 4.3.0 onwards which makes MPN_ZERO miscompile when supplied with a
zero operand. But MPN_ZERO is supposed to work with a zero operand, so
this is actually a bug in MPIR. Well it would be if it weren't a
compiler bug.

But the last time we hit a bug like this (on Cleo, which is an ia64 if
I recall), actually with MPN_ZERO as it happens, I tried rewriting
MPN_ZERO, but after a full day's work I was unable to come up with
*ANY* macro which did not miscompile on that machine. There the gcc
was 4.1.2.

In that situation it was not just the zero operand that was the
problem. It was to do with the specific situation in which MPN_ZERO
was used. I checked that the macro was being expanded correctly by the
compiler, but the expression parser was miscompiling the resulting
expression (which was very complex).

In the end I gave up and the one place in MPIR where the macro got
misexpanded and caused the MPIR test failure, I just used totally
different code, avoiding the use of MPN_ZERO altogether. This was a
temporary fix, and gcc 4.1.2 should not be used to compiler MPIR in
ia64, and I've no idea what other versions of gcc miscompile it
(though I did verify that a later gcc didn't have the bug in that
case).

David Kirkby claims the problem is due to memset being implemented
differently on that architecture with that version of Sun. But I don't
think MPN_ZERO actually uses memset. It is way too slow. We have a
macro which gets expanded for MPN_ZERO.

Basically on T2 we should be patching MPIR so that it uses code which
compiles correctly on T2, otherwise it is a bug in MPIR which may
cause an infinite number of bugs in Sage. (MPN_ZERO is used a *LOT*).
For this reason, I'm declaring the fix incorrect. I hate doing this,
because David has put a huge amount of effort into this one.

Here is the current implementation of MPN_ZERO in MPIR:

#if HAVE_NATIVE_mpn_store
#define mpn_store __MPN(store)
mp_limb_t mpn_store _PROTO ((mp_ptr,mp_size_t,mp_limb_t));
#else
#define mpn_store(dst, n,val) \
do { \
ASSERT ((n) >= 0); \
if ((n) != 0) \
{ \
mp_ptr __dst = (dst); \
mp_size_t __n = (n); \
do \
*__dst++ = val; \
while (--__n); \
} \
} while (0)
#endif

#define MPN_ZERO(dst,n) mpn_store(dst,n,0)

Note there is no native mpn_store on T2.

The *only* correct fix for this is to write a native assembly language
mpn_store for this architecture. We should also do the same for ia64
while we are at it.

Warning: this is not going to happen in the next day, or even week.
We'll open an MPIR ticket for it. In the mean time, the *only* correct
fix for Sage is to not compile Sage on T2 with gcc 4.3.0 or later. It
should bail out with a warning. The compiler bug should also be
reported by the MPIR crew to gcc, once we have verified that it is
indeed a miscompilation. It is clear they haven't fixed this since it
appeared in gcc 4.3.0.

You *cannot trust* Sage built on T2 with gcc 4.3.0 or later regardless
of whether you get the test suite to pass or not. MPN_ZERO is a
critical function used extensively in MPIR itself and many things
which use it, e.g. it is used in the FFT.

Bill.

2009/7/16 Tom Boothby <tomas....@gmail.com>:
> On Wed, Jul 15, 2009 at 9:57 PM, Bill Hart<goodwi...@googlemail.com> wrote:
>> I'll take a look. But these are a bit difficult to review because sage
>> doesn't build the way Kirkby would like on this machine, so it's a bit
>> difficult to test any individual one of his patches, and he's still
>> got patches to go before it does build.
>
> Yikes.
>
>>
>> There should be a sage-t2 branch and they should be being merged into
>> that until the entire patchset is ready. The poor guy is gonna give up
>> in frustration at the rate things are going.
>
> Boy, I hope not.  Not many Solaris gurus around.
>
>>
>> Bill.
>>
>> 2009/7/16 Tom Boothby <tomas....@gmail.com>:
>>> Hi,
>>>
>>> There are 90 tickets needing review, and we'd like to close as many of
>>> them for Sage 4.1.1 as possible.  Please review
>>>
>>> billhart        #6453   MPFR test failures on Solaris 10 update 4 on host 't2'
>>>
>>> If you can't / won't / don't feel like reviewing, please respond and tell me so!
>>>
>>> Thanks,
>>>   --tom
>>>
>>
>

Bill Hart

unread,
Jul 17, 2009, 11:53:58 AM7/17/09
to Tom Boothby, Dr. David Kirkby, mpir-dev
I've made tickets to fix MPN_ZERO on both ia64:

http://trac.mpir.org/mpir_trac/ticket/223

and T2:

http://trac.mpir.org/mpir_trac/ticket/224

Bill.

2009/7/17 Bill Hart <goodwi...@googlemail.com>:

Marc

unread,
Jul 17, 2009, 12:03:01 PM7/17/09
to mpir-dev
On Jul 17, 8:37 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> David Kirkby claims the problem is due to memset being implemented
> differently on that architecture with that version of Sun. But I don't
> think MPN_ZERO actually uses memset. It is way too slow. We have a
> macro which gets expanded for MPN_ZERO.

Assuming we are talking about the same thing, mpfr has an issue, but
that is completely unrelated to gmp/mpir, as mpfr defines its own
MPN_ZERO macro in terms of memset.

Bill Hart

unread,
Jul 17, 2009, 1:07:27 PM7/17/09
to mpir...@googlegroups.com, Tom Boothby, Dr. David Kirkby
Ah, we are talking about the same thing I think.

How silly to pollute the namespace with another macro with the same
name, especially as MPFR and GMP/MPIR are sister projects. But I am
guessing this is not an exported symbol but only used internally.

That means all of my comments regarding the issue with MPN_ZERO are
irrelevant with respect to David Kirkby's workaround in Sage.

For various reasons, the correct fix then is precisely what David
Kirby is doing. That means the sage ticket does need to be reviewed.
I'll close the MPIR ticket associated with this.

As for me reviewing the sage ticket, I give up. Too hard for me.
Someone else can do it please. I don't have time in the foreseeable
future as it is quite involved. Someone with a working Sage install on
t2 should do it, as it would be about a day's less work for them than
it would be for me. Obviously it should be checked that it doesn't
break the build on some non-T2 machine, e.g. sage.math.

Bill.

2009/7/17 Marc <marc....@gmail.com>:
Reply all
Reply to author
Forward
0 new messages