Integer division update

Bill Hart

unread,

Apr 15, 2013, 11:40:49 AM4/15/13

to mpir-devel, flint-devel

Hi all,

I've finally managed to get the divide and conquer integer division code I've been working on into MPIR. The test code is now passing, though I will need to do some more testing.

Below are timings for divapprox code in GMP and the new code. I've included the best timings out of basecase and divide and conquer in each case;

new code DIVAPPR

size = 3: classical = 5.7e-08s,

size = 4: classical = 7.5e-08s,

size = 5: classical = 9.7e-08s,

size = 6: classical = 1.17e-07s,

size = 7: classical = 1.41e-07s,

size = 8: classical = 1.65e-07s,

size = 9: classical = 1.89e-07s,

size = 10: classical = 2.14e-07s,

size = 11: classical = 2.38e-07s,

size = 13: classical = 2.92e-07s,

size = 15: classical = 3.64e-07s,

size = 17: classical = 4.28e-07s,

size = 19: classical = 4.89e-07s,

size = 21: classical = 5.96e-07s,

size = 24: classical = 7.25e-07s,

size = 27: classical = 8.7e-07s,

size = 30: classical = 1.043e-06s,

size = 33: classical = 1.2e-06s,

size = 37: classical = 1.475e-06s,

size = 41: classical = 1.8e-06s,

size = 46: divconquer = 2.14e-06s

size = 51: divconquer = 2.59e-06s

size = 57: divconquer = 3.1e-06s

size = 63: divconquer = 3.77e-06s

size = 70: divconquer = 4.46e-06s

size = 77: divconquer = 5.19e-06s

size = 85: divconquer = 6.06e-06s

size = 94: divconquer = 6.89e-06s

size = 104: divconquer = 8.2e-06s

size = 115: divconquer = 9.6e-06s

size = 127: divconquer = 1.16e-05s

size = 140: divconquer = 1.36e-05s

size = 154: divconquer = 1.57e-05s

size = 170: divconquer = 1.83e-05s

size = 188: divconquer = 2.11e-05s

size = 207: divconquer = 2.5e-05s

size = 228: divconquer = 2.9e-05s

size = 251: divconquer = 3.47e-05s

size = 277: divconquer = 4.09e-05s

size = 305: divconquer = 4.69e-05s

size = 336: divconquer = 5.63e-05s

size = 370: divconquer = 6.24e-05s

size = 407: divconquer = 7.4e-05s

size = 448: divconquer = 8.6e-05s

size = 493: divconquer = 0.000103s

size = 543: divconquer = 0.000121s

size = 598: divconquer = 0.000142s

size = 658: divconquer = 0.000164s

size = 724: divconquer = 0.000185s

size = 797: divconquer = 0.000221s

size = 877: divconquer = 0.000253s

size = 965: divconquer = 0.000294s

gmp-5.1.1 DIVAPPR

size = 3: classical = 6.7e-08s,

size = 4: classical = 8.8e-08s,

size = 5: classical = 1.13e-07s,

size = 6: classical = 1.36e-07s,

size = 7: classical = 1.85e-07s,

size = 8: classical = 1.95e-07s,

size = 9: classical = 2.27e-07s,

size = 10: classical = 2.51e-07s,

size = 11: classical = 2.98e-07s,

size = 13: classical = 3.54e-07s,

size = 15: classical = 4.35e-07s,

size = 17: classical = 5.07e-07s,

size = 19: classical = 6.07e-07s,

size = 21: classical = 7.05e-07s,

size = 24: classical = 8.35e-07s,

size = 27: classical = 1.052e-06s,

size = 30: classical = 1.233e-06s,

size = 33: classical = 1.447e-06s,

size = 37: classical = 1.719e-06s,

size = 41: classical = 2e-06s,

size = 46: classical = 2.4e-06s,

size = 51: classical = 2.81e-06s,

size = 57: classical = 3.38e-06s,

size = 63: classical = 3.97e-06s,

size = 70: classical = 4.67e-06s,

size = 77: classical = 5.45e-06s,

size = 85: classical = 6.38e-06s,

size = 94: classical = 7.48e-06s,

size = 104: classical = 8.9e-06s,

size = 115: classical = 1.04e-05s,

size = 127: classical = 1.25e-05s,

size = 140: classical = 1.46e-05s,

size = 154: classical = 1.72e-05s,

size = 170: classical = 2.05e-05s,

size = 188: classical = 2.43e-05s,

size = 207: classical = 2.9e-05s,

size = 228: classical = 3.45e-05s,

size = 251: divconquer = 4.07e-05s

size = 277: divconquer = 4.8e-05s

size = 305: divconquer = 5.6e-05s

size = 336: divconquer = 6.47e-05s

size = 370: divconquer = 7.63e-05s

size = 407: divconquer = 9e-05s

size = 448: divconquer = 0.000104s

size = 493: divconquer = 0.000123s

size = 543: divconquer = 0.000143s

size = 598: divconquer = 0.000166s

size = 658: divconquer = 0.000193s

size = 724: divconquer = 0.000224s

size = 797: divconquer = 0.000265s

size = 877: divconquer = 0.000305s

size = 965: divconquer = 0.000358s

So it looks like the new code is most often about 20% faster than GMP.

The main things that remain to be done now are:

* further testing

* switch all precomputed inverses in MPIR over to the new ones (actually test code passes either way because the precomputed inverses are rarely different, but we have to change them obviously)

* remove the file mpn/generic/dc_divappr_q_n.c which I believe is no longer needed

* write speed and tuning code for the new cutoffs (they are currently hard coded)

Once I've done the first three I'll merge into trunk. They may or may not take a while to do depending on how the testing goes.

I plan to work on a new release of MPIR after May 12th (I am running a flint workshop from 4-12 May). We have a long list of bugs and tickets which need to be resolved, many of which have been reported by the Sage project.

Bill.

Bill Hart

unread,

Apr 15, 2013, 12:44:20 PM4/15/13

to mpir-devel, flint-devel

All the extended test code I ran passed, so I'm happy that the code is working well now.

I've also switched all the precomputed inverses over to the new ones.

I decided not to remove dc_divappr_q_n.c until Brian has had a chance to catch up with Windows. (I think apart from the add_333 and sub_333 functions there should be no other changes on Windows Brian.)

I have merged everything into master and will wait until Brian has been able to test on Windows before removing the file dc_divappr_q_n.c, as the latter will require quite a lot of build changes on linux and Windows.

Bill.

Brian Gladman

unread,

Apr 16, 2013, 5:05:21 AM4/16/13

to mpir-...@googlegroups.com, Bill Hart, flint-devel

On 15/04/2013 17:44, Bill Hart wrote:
> All the extended test code I ran passed, so I'm happy that the code is
> working well now.
>
> I've also switched all the precomputed inverses over to the new ones.
>
> I decided not to remove dc_divappr_q_n.c until Brian has had a chance to
> catch up with Windows. (I think apart from the add_333 and sub_333
> functions there should be no other changes on Windows Brian.)
>
> I have merged everything into master and will wait until Brian has been
> able to test on Windows before removing the file dc_divappr_q_n.c, as
> the latter will require quite a lot of build changes on linux and Windows.

Congratulations Bill, the the Windows x64 builds work and pass all tests
without needing any changes at all!

The C versions of add_333 and sub_333 have to be used on Windows x64 as
there is no inline assembler capability in the MSVC x64 compiler. I
could provide inline assembler for win32 but, since I am not using win32
any more, this really should be done by someone who needs it and can
hence maintain it.

Inline assembler versions would be useful for the Windows mingw and
mingw64 builds. But since Jason passed away, these builds are sadly no
longer being maintained. Hence we again need a volunteer from the
communities that need these builds to develop and maintain any inline
assembler code involved.

If these routines really matter, I can produce full assembler versions
for x64 with the MSVC compiler but for the present, at least, it seems
that the default C versions will have to do.

Brian

Bill Hart

unread,

Apr 16, 2013, 9:01:58 AM4/16/13

to Brian Gladman, mpir-devel, flint-devel

Hi Brian,

that's great to hear that everything passes on Windows 64.

I don't think there is any benefit to writing full assembly files for these two new functions. The reason is that the function call overhead would be about 9 cycles, which would be more than than actual saving over the C macros that we have. So in fact, assembly files would likely slow it down.

However, I agree that it would be very beneficial if someone were to volunteer to write _inline_ assembly for Windows 32 bit, as this would give a 5% speedup on that platform. Conversely, if no one is actually using this platform then letting support lapse seems to be an option. I will continue to maintain MinGW build support for Windows 32 for a while and that will at least provide an option for people on that platform. But I'm not going to put time into non-essential assembly code for that platform as there are simply too many other things to maintain.

I should have a better indication of my future in the next few weeks, at which point I should know how much time I can put into MPIR in future. Even so, with only two people volunteering we can't hope to accomplish much. So I agree, we do need further volunteers!

Bill.

Brian Gladman

unread,

Apr 16, 2013, 10:48:53 AM4/16/13

to Bill Hart, mpir-devel

On 16/04/2013 14:01, Bill Hart wrote:
> Hi Brian,
>
> that's great to hear that everything passes on Windows 64.
>
> I don't think there is any benefit to writing full assembly files for
> these two new functions. The reason is that the function call overhead
> would be about 9 cycles, which would be more than than actual saving
> over the C macros that we have. So in fact, assembly files would likely
> slow it down.
>
> However, I agree that it would be very beneficial if someone were to
> volunteer to write _inline_ assembly for Windows 32 bit, as this would
> give a 5% speedup on that platform. Conversely, if no one is actually
> using this platform then letting support lapse seems to be an option. I
> will continue to maintain MinGW build support for Windows 32 for a while
> and that will at least provide an option for people on that platform.
> But I'm not going to put time into non-essential assembly code for that
> platform as there are simply too many other things to maintain.
>
> I should have a better indication of my future in the next few weeks, at
> which point I should know how much time I can put into MPIR in future.
> Even so, with only two people volunteering we can't hope to accomplish
> much. So I agree, we do need further volunteers!

Another option on Windows x64 is to use the Intel compiler since this
does support inline x64 assembler. I have just tried this with some
assembler support (a bit of a hack right now) and all tests pass. On the
benchmark this gives a 3% overall speed increase.

Brian

Bill Hart

unread,

Apr 16, 2013, 11:38:29 AM4/16/13

to Brian Gladman, mpir-devel

Hi Brian,

that's great. We can include ICC guards for this.

Bill.

Brian Gladman

unread,

Apr 16, 2013, 11:49:02 AM4/16/13

to mpir-...@googlegroups.com, Bill Hart

On 16/04/2013 16:38, Bill Hart wrote:
> Hi Brian,
>

> that's great. We can include ICC guards for this.

Adding the mingw/mingw64 stuff turns out to be easy since all I have to
do is copy the longlong_inc.h content in the x86 and x64 directories
into the longlong_inc.h files in the x86w and x64w directories.

The Intel compiler guard on Windows is __ICL so I have added this and
tested with the Intel compiler and all is well.

Brian

Bill Hart

unread,

Apr 16, 2013, 11:59:20 AM4/16/13

to Brian Gladman, mpir-devel

Ah yes, we don't have to worry about ABI with inline assembly, so this is good news. Thanks for doing this Brian.

Bill.

Brian Gladman

unread,

Apr 16, 2013, 12:02:45 PM4/16/13

to mpir-...@googlegroups.com, Bill Hart

On 16/04/2013 16:59, Bill Hart wrote:
> Ah yes, we don't have to worry about ABI with inline assembly, so this
> is good news. Thanks for doing this Brian.

I (think I) have committed this now so you can pull it into trunk
whenever you are ready (I am a shaky GIT user).

Brian

Bill Hart

unread,

Apr 16, 2013, 3:32:39 PM4/16/13

to Brian Gladman, mpir-devel

Thanks Brian, this seems to have been successful. I have now merged your changes with my master (you might like to merge my master with yours again just to ensure you got the latest everything -- I made a few additional minor changes after I announced it on the list).

I now intend to take a break until about May 12th at which time I will start work on tuning code and the new release of MPIR. I still have some code to write which we promised to include for the GMP-ECM project too.

Bill.

Bill Hart

unread,

Apr 16, 2013, 3:33:22 PM4/16/13

to mpir-devel

Reply all

Reply to author

Forward