size = 4: classical = 8.9e-08s,
size = 5: classical = 1.13e-07s,
size = 6: classical = 1.37e-07s,
size = 7: classical = 1.85e-07s,
size = 8: classical = 1.96e-07s,
size = 9: classical = 2.27e-07s,
size = 10: classical = 2.51e-07s,
size = 11: classical = 2.99e-07s,
size = 13: classical = 3.54e-07s,
size = 15: classical = 4.35e-07s,
size = 17: classical = 5.07e-07s,
size = 19: classical = 6.1e-07s,
size = 21: classical = 7.05e-07s,
size = 24: classical = 8.36e-07s,
size = 27: classical = 1.059e-06s,
size = 30: classical = 1.234e-06s,
size = 33: classical = 1.449e-06s,
size = 37: classical = 1.722e-06s,
So, 17.5% - 30% improvement, with roughly 20% improvement on average.
Pretty happy with that! :-)
MPIR was already slightly faster than GMP due to previously mentioned optimisation. But the new code is uniformly faster up to around size = 30.
I have committed this on the dp1_divrem branch in github.
Speeding up sb_div_q is next!
Bill.