Segfault in mpz_divexact()

30 views
Skip to first unread message

Jeroen Demeyer

unread,
Sep 3, 2010, 5:10:29 PM9/3/10
to mpir-devel
Hello mpir-devel,

I think I found a bug in MPIR 2.1.1 in mpz_divexact() (or I'm doing
something something very stupid). Running the following program gives a
Segmentation Fault:

#include <mpir.h>
int main()
{
mpz_t Z, R;
mpz_init(Z);
mpz_init(R);
mpz_ui_pow_ui(Z, 10, 100000);
mpz_divexact(R, Z, Z);
return 0;
}


gdb backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00007fa2f2b5245d in mpn_submul_1 () from
/usr/local/src/pari/local/lib/libmpir.so.8
(gdb) bt
#0 0x00007fa2f2b5245d in mpn_submul_1 () from
/usr/local/src/pari/local/lib/libmpir.so.8
#1 0x00007fa2f2b73e72 in __gmpn_sb_divappr_q () from
/usr/local/src/pari/local/lib/libmpir.so.8
#2 0x00007fa2f2b812f2 in __gmpn_inv_divappr_q () from
/usr/local/src/pari/local/lib/libmpir.so.8
#3 0x00007fa2f2b88765 in __gmpn_divexact () from
/usr/local/src/pari/local/lib/libmpir.so.8
#4 0x00007fa2f2b3fb4b in __gmpz_divexact () from
/usr/local/src/pari/local/lib/libmpir.so.8
#5 0x0000000000400741 in main () at mpirtest.c:8


My system is:

$ uname -a
Linux arcanis 2.6.32-gentoo-r7 #5 SMP Thu Jun 10 23:07:26 CEST 2010
x86_64 Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz GenuineIntel GNU/Linux

$ ./config.guess
core2-unknown-linux-gnu

$ gcc --version
gcc-4.4.3 (Gentoo 4.4.3-r2 p1.2) 4.4.3

This is with MPIR 2.1.1 configured with ./configure --enable-gmpcompat


This bug was found thanks to Sage (see
http://trac.sagemath.org/sage_trac/ticket/9837)


Jeroen Demeyer.

Bill Hart

unread,
Sep 3, 2010, 6:05:05 PM9/3/10
to mpir-...@googlegroups.com
Hi Jeroen,

This looks like a very live bug. Thanks for the report and the trace.
I think this looks to be in code I added to MPIR when I worked on it,
so I'll try to track this down.

Regarding the internal undocumented functions used by Pari, if there
are no missing symbol errors when you build against MPIR, then it is
likely these are mpn functions that were formerly undocumented, but
now part of the documented interface. I think it would be safe to
leave that flag set (so long as it actually compiles and links).

Bill.

> --
> You received this message because you are subscribed to the Google Groups "mpir-devel" group.
> To post to this group, send email to mpir-...@googlegroups.com.
> To unsubscribe from this group, send email to mpir-devel+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.
>
>

Bill Hart

unread,
Sep 3, 2010, 6:44:11 PM9/3/10
to mpir-...@googlegroups.com
Hi Jeroen,

I've tried a variety of things to replicate this bug, including
valgrinding with symbols on in MPIR and various compiler optimisation
flags, but at least on my machine, I can't replicate it.

If you have time, could you possibly help by taking the following steps:

1) Download a fresh copy of MPIR 2.1.1 from our website: http://www.mpir.org/
2) Run ./configure; make; make check to verify that the library
appears to compile OK on your machine (you could also try building
your program against this clean library again if you haven't already
done this)
3) Run ./configure again until about 30 lines are showing. You'll see
something like
CFLAGS="-O2 -m64 -march=k8 -mtune=k8"
which will be indented (the string on your machine will be different).
Copy the string that shows and alter the -O2 (or whatever you have) to
-g, and pass to ./configure, e.g..:
./configure CFLAGS="-g -m64 -march=k8 -mtune=k8"
4) do
make clean
make
5) build your program against MPIR (to prevent it picking up a stray
copy elsewhere, use -static when linking -- the library being in the
.libs directory, the .h file in the top level directory of the MPIR
source tree -- you'll have to set the -I and -L flags to gcc
accordingly), e.g. I put the program in the top level mpir source
directory and used:
gcc -O2 mpirtest.c -o mpirtest -I. -L../.libs -lmpir -static
6) If you have valgrind installed on your machine, type:
valgrind ./mpirtest
and report to us the output. If not, a gdb backtrace, as you gave, may
be sufficient.
7) If you have access to any other version of gcc on the same machine,
it would also be useful to know if the same bug occurs with that
version of gcc.

This *may* be a compiler bug, but at this stage we have to assume it
is more likely a bug in our code within MPIR.

Bill.

On 3 September 2010 22:10, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:

Bill Hart

unread,
Sep 3, 2010, 6:49:14 PM9/3/10
to mpir-...@googlegroups.com

Sorry, that should read:

gcc -O2 mpirtest.c -o mpirtest -I. -L.libs -lmpir -static

Jason

unread,
Sep 4, 2010, 5:18:05 AM9/4/10
to mpir-...@googlegroups.com
Hi

I can reproduce this error , I'm using gcc 4.4.4 , but only if I do a build
for core2 or nehalem , the k8 build is OK , so perhaps it's the CFLAGS=-
march=core2 that causes it?
I'll change the flags and see if that makes a difference

Jason

Jason

unread,
Sep 4, 2010, 6:09:23 AM9/4/10
to mpir-...@googlegroups.com
The problem is that the gmp-mparam.h for core2 and nehalem have some bad
values in , when I replaced it with the k8 version it was OK.
I have to narrow down which values seem to be bad

Jason

Jason

unread,
Sep 4, 2010, 6:36:35 AM9/4/10
to mpir-...@googlegroups.com
On Saturday 04 September 2010 11:09:23 Jason wrote:
> The problem is that the gmp-mparam.h for core2 and nehalem have some bad
> values in , when I replaced it with the k8 version it was OK.
> I have to narrow down which values seem to be bad
>

Actually thats not true , the values could be good , its just that it follows
a different code path. Building with --enabel-assert we get this assertion
failure when we run the code

inv_divappr_q.c:46: GNU MP assertion failed: nn > dn
Aborted

Jason

unread,
Sep 4, 2010, 7:54:33 AM9/4/10
to mpir-...@googlegroups.com
I can reproduce this error on boxen , which has gcc-4.2.4

./configure --enable-assert --build=core2-unknown-linux-gnu

and just run the test case to get an assertion failure , therefore this looks
like a real error , although I could not the failure when I copied the core
param into the K8 directory and changed cflags , so I think this also depends
on which HAVE_NATIVE functions are availible

Jason

unread,
Sep 4, 2010, 8:43:05 AM9/4/10
to mpir-...@googlegroups.com
On Saturday 04 September 2010 12:54:33 Jason wrote:
> I can reproduce this error on boxen , which has gcc-4.2.4
>
> ./configure --enable-assert --build=core2-unknown-linux-gnu
>
> and just run the test case to get an assertion failure , therefore this
> looks like a real error , although I could not the failure when I copied
> the core param into the K8 directory and changed cflags , so I think this
> also depends on which HAVE_NATIVE functions are availible
>

No , if I put the core2 params in the k8 directory and build on a k8 , I get
the error(must of done something daft before).
The error happens when mpn_divexact (which can accept denom limbs dn <= numer
limbs nn) calls mpn_inv_divapprox_q (which requires just <) . With the k8
thresholds it reduces that case properly , but with core2 params it does not.
It's possible that the k8 params could also produce this error as it just have
to find the right input size.
Unfortunately I'm not at all familiar with the code , I only roughly know the
algorithm.

Bill Hart

unread,
Sep 4, 2010, 8:53:03 AM9/4/10
to mpir-...@googlegroups.com
Ah, thanks for making progress with it. I'll see if I can chase it
through given what you've found.

Bill.

Bill Hart

unread,
Sep 4, 2010, 11:47:21 AM9/4/10
to mpir-...@googlegroups.com
OK, I think the following patch to mpn/generic/divexact.c fixes this issue:

wbhart@selmer:~/mpir-2.1.1/mpn/generic$ diff divexact.c divexact.old
131,141c131,138
< if (qn)
< {
< inv = TMP_ALLOC_LIMBS(dn);
< mpn_invert(inv, dp, dn);
< cy = mpn_inv_divappr_q(qp, n2p, nn, dp, dn, inv);
< if (!extend) qp[qn] = cy;
<
< if ((qp[0] & 1) + q_even != 1) /* quotient is out by 1 */
< mpn_sub_1(qp, qp, qn + 1, 1);
< } else
< qp[0] = 1; // as dp is normalised, exact division means qp = 1
---
> inv = TMP_ALLOC_LIMBS(dn);
> mpn_invert(inv, dp, dn);
> cy = mpn_inv_divappr_q(qp, n2p, nn, dp, dn, inv);
> if (!extend) qp[qn] = cy;
>
> if ((qp[0] & 1) + q_even != 1) /* quotient is out by 1 */
> mpn_sub_1(qp, qp, qn + 1, 1);
>

I don't seem to have an mpir svn setup at the moment. If someone could
commit this and test, I'd be grateful.

I don't see a reason to change the requirement that nn > dn, just that
this requirement is respected by divexact, which it wasn't doing.

Bill.

Jason

unread,
Sep 4, 2010, 12:29:25 PM9/4/10
to mpir-...@googlegroups.com
Commited to mpir-2.1 and trunk . I should add a test case , was it when dn==nn
, although I couldn't get the K8 to fail.

Jason

Bill Hart

unread,
Sep 4, 2010, 8:59:41 PM9/4/10
to mpir-...@googlegroups.com
This fix also fixed a segfault Fredrik found coincidentally at the same time.

Can we do a quick 2.1.1 release for this? I think nothing more is
required than to do a version change, tarball and upload to the
website. It passes make check on my machine with this patch, and given
that it is pure C and fixed the problem, and a well understood patch,
I think a fairly good candidate for a quick release.

Bill.

Jason

unread,
Sep 5, 2010, 3:41:48 AM9/5/10
to mpir-...@googlegroups.com
I'll do a release now

Jason

Bill Hart

unread,
Sep 5, 2010, 8:56:19 AM9/5/10
to mpir-...@googlegroups.com
Thanks very much! I'll let the sage guys know. Don't know if they'll
find more segfaults or not....

Bill.

Jeroen Demeyer

unread,
Sep 5, 2010, 8:57:26 AM9/5/10
to mpir-...@googlegroups.com
On 2010-09-05 14:56, Bill Hart wrote:
> Thanks very much! I'll let the sage guys know.

I already did that, see
http://trac.sagemath.org/sage_trac/ticket/8664#comment:16

Bill Hart

unread,
Sep 5, 2010, 9:00:12 AM9/5/10
to mpir-...@googlegroups.com
Yep, just figured it out. Thanks!!

Jeroen Demeyer

unread,
Sep 5, 2010, 11:41:14 AM9/5/10
to mpir-...@googlegroups.com
On 2010-09-04 00:05, Bill Hart wrote:
> Regarding the internal undocumented functions used by Pari, if there
> are no missing symbol errors when you build against MPIR, then it is
> likely these are mpn functions that were formerly undocumented, but
> now part of the documented interface. I think it would be safe to
> leave that flag set (so long as it actually compiles and links).

PARI doesn't use any undocumented functions, it uses the structure of
the mpz_t type (see MPIR documentation, section 16.1).

It would be useful for PARI (and hence Sage) to have a public interface
for mpn_divexact (currently, there is only mpn_divexact_by3 and
mpn_divexact_by3c).

Jeroen.

Jason

unread,
Sep 5, 2010, 1:37:22 PM9/5/10
to mpir-...@googlegroups.com

Hi

There are a few problems with this , GMP would have to have the same interface
or I certain Pari would not use it. There is a second more subtle problem , do
you want divexact (ie division which is only correct when exact ) or
divrem_hensel (ie division which gives the hensel(2-adic) quotient )
If you use mpn_divexact and are relying on it to be hensel division then when
we change the algorithm later for a faster one (ie bidirectional) then your
code will break , or we will have to fudge it.
For example the old mpn_divexact_by3c was hensel division , but the new one
(which is about twice as fast) is not , and we had to fudge the feed-in and
wind-out code to get the same remainders.

Jason

Jeroen Demeyer

unread,
Sep 5, 2010, 2:44:03 PM9/5/10
to mpir-...@googlegroups.com
On 2010-09-05 19:37, Jason wrote:
> Hi
>
> There are a few problems with this , GMP would have to have the same interface
> or I certain Pari would not use it.
Well, this could be a Sage-only patch. As far as I know, exact division
is the only place in PARI/GP where GMP/MPIR internals are being used, so
it would be nice to get rid of this.

> There is a second more subtle problem , do
> you want divexact (ie division which is only correct when exact ) or
> divrem_hensel (ie division which gives the hensel(2-adic) quotient )
> If you use mpn_divexact and are relying on it to be hensel division then when
> we change the algorithm later for a faster one (ie bidirectional) then your
> code will break , or we will have to fudge it.

Well, the point of the public function mpz_divexact() is that you *know*
that one number is divisible an other. Nothing in the mpz_divexact()
documentation mentions 2-adic division. So I think the only necessary
criterion is that the quotient is correct when it is known that the
remainder is zero.

Jeroen.

Bill Hart

unread,
Sep 6, 2010, 3:04:24 PM9/6/10
to mpir-...@googlegroups.com
On 5 September 2010 19:44, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> On 2010-09-05 19:37, Jason wrote:
>> Hi
>>
>> There are a few problems with this , GMP would have to have the same interface
>> or I certain Pari would not use it.
> Well, this could be a Sage-only patch.  As far as I know, exact division
> is the only place in PARI/GP where GMP/MPIR internals are being used, so
> it would be nice to get rid of this.

It seems reasonable to me that we could mark the documentation with
"MPIR only" for functions supported by us and not by GMP.

The problem is that suppose MPIR defines mpn_divexact one way, then
GMP come along and define it differently, but with the same name. We
then have to change MPIR to be compatible with GMP, breaking
everyone's code who relied upon the MPIR interface.

This is why Jason is reluctant to document this. If you want to use an
undocumented function, go right ahead, It's:

mpn_divexact (mp_ptr qp,
mp_srcptr np, mp_size_t nn,
mp_srcptr dp, mp_size_t dn)

which does what you think it does.

>
>> There is a second more subtle problem , do
>> you want divexact (ie division which is only correct when exact ) or
>> divrem_hensel (ie division which gives the hensel(2-adic) quotient )
>> If you use mpn_divexact and are relying on it to be hensel division then when
>> we change the algorithm later for a faster one (ie bidirectional) then your
>> code will break , or we will have to fudge it.
> Well, the point of the public function mpz_divexact() is that you *know*
> that one number is divisible an other.  Nothing in the mpz_divexact()
> documentation mentions 2-adic division.  So I think the only necessary
> criterion is that the quotient is correct when it is known that the
> remainder is zero.

The most useful one for me (in flint) is also the divexact interface,
where you know in advance that it is exactly divisible.

Bill.

>
> Jeroen.

Jason

unread,
Sep 22, 2010, 2:40:48 PM9/22/10
to mpir-...@googlegroups.com
On Monday 06 September 2010 20:04:24 Bill Hart wrote:
> On 5 September 2010 19:44, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> > On 2010-09-05 19:37, Jason wrote:
> >> Hi
> >>
> >> There are a few problems with this , GMP would have to have the same
> >> interface or I certain Pari would not use it.
> >
> > Well, this could be a Sage-only patch. As far as I know, exact division
> > is the only place in PARI/GP where GMP/MPIR internals are being used, so
> > it would be nice to get rid of this.
>
> It seems reasonable to me that we could mark the documentation with
> "MPIR only" for functions supported by us and not by GMP.
>
> The problem is that suppose MPIR defines mpn_divexact one way, then
> GMP come along and define it differently, but with the same name. We
> then have to change MPIR to be compatible with GMP, breaking
> everyone's code who relied upon the MPIR interface.
>

This can be solved (relatively easily ) by library versioning , basically , if
I understood it correctly , is that one library can "export" many incompatible
version of the same function , ie when you link to the library , you have
specify which version of it you want. I read this a year or two ago , I think
in the libtool docs.

Reply all
Reply to author
Forward
0 new messages