Fwd: [ODK participants] Blog post on fast multivariate arithmetic

William Stein

unread,

Jul 9, 2017, 11:39:29 AM7/9/17

to sage-devel

New blog post from Bill Hart. It includes a section claiming Sage’s multivariate polynomial arithmetic speed is much worse than expected...

---------- Forwarded message ---------
From: Bill Hart <goodwi...@googlemail.com>
Date: Sun, Jul 9, 2017 at 8:34 AM
Subject: [ODK participants] Blog post on fast multivariate arithmetic
To: Opendreamkit participants <partic...@opendreamkit.org>

Dear all,

I've written a blog post on the fast multivariate arithmetic we've been doing for ODK [1]. This is a deliverable which is due at the end of the four years, so we've got a long way to go, but it is progressing nicely in the interim.

The next stage is to parallelise this, and we've hired Daniel Schultz to work on our ODK deliverable which will do precisely this. He's an experienced developer from the US who was already writing his own CAS (in assembly language, believe it or not!)

Bill.

[1] https://wbhart.blogspot.de/2017/07/update-on-fast-multivariate-polynomial.html

--

-- William Stein

Ralf Stephan

unread,

Jul 10, 2017, 2:01:50 AM7/10/17

to sage-devel

It's likely he used Sage's symbolics (and an old version at that) instead of poly rings. I replied there.

Vincent Delecroix

unread,

Jul 10, 2017, 2:55:23 AM7/10/17

to sage-...@googlegroups.com

He was certainly not using the awfully slow symbolic ring

sage: x,y,z,t,u = SR.var('x,y,z,t,u')
sage: f = (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^6
sage: g = (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^6
sage: %time h = (f*g).expand()
CPU times: user 8.99 s, sys: 25.7 ms, total: 9.01 s
Wall time: 9.01 s

versus

sage: R.<x,y,z,t,u> = QQ[]
sage: f = (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^6
sage: g = (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^6
sage: %time h = f*g
CPU times: user 191 ms, sys: 0 ns, total: 191 ms
Wall time: 191 ms

This 0.2 sec corresponds perfectly to his multiplication benchmarks.

Ralf Stephan

unread,

Jul 10, 2017, 3:34:39 AM7/10/17

to sage-devel

On Monday, July 10, 2017 at 8:55:23 AM UTC+2, vdelecroix wrote:

He was certainly not using the awfully slow symbolic ring

Then his slow timings for e.g. "Divisibility test with quotient (sparse)" needs a different explanation.

Bill Hart

unread,

Jul 10, 2017, 5:28:12 AM7/10/17

to sage-devel

Because it is "divisibility test with quotient", not "divisibility test". It is equivalent to calling Magma's "IsDivisibleBy" with two return arguments.

Vincent Delecroix

unread,

Jul 10, 2017, 5:56:32 AM7/10/17

to sage-...@googlegroups.com

Sage version?

Bill Hart

unread,

Jul 10, 2017, 6:05:28 AM7/10/17

to sage-devel

7.6

Bill Hart

unread,

Jul 10, 2017, 6:13:00 AM7/10/17

to sage-devel

The reason that I required the quotient as well in the divisibility benchmark was that Magma does the n = 20 dense case in 0.15s otherwise, and I don't believe it is possible to do it that fast if you aren't doing it heuristically, as I explained in the blog post. Therefore, all the systems timed must return the quotient if the division is exact (which it is in the benchmark examples, since that is the hard case), so that Magma can't possibly "cheat".

As mentioned, if the various systems don't provide such a function, I substituted divrem, since it is possible to look at the remainder to see if it was divisible, and then take the quotient as required by the benchmark.

It is really hard to benchmark a bunch of systems against one another. When I first timed Giac, I wasn't aware of a bunch of special parameters you can pass to the Giac functions which make them run much faster. And basically, there aren't many functions that all the systems happen to implement with the same semantics, so I have to simulate what a user would do if that is the semantics they want. I didn't do it to make Sage look bad, honest!

mmarco

unread,

Jul 10, 2017, 6:48:55 AM7/10/17

to sage-devel

It is surprising the difference between singular and Sage, considering that Sage mostly relies on Singular for multivariate polynomial arithmetic. In the case of divisions, I suspect that it has to do with the fact that Sage treats division of polynomials as an operation in the fraction field, so it would construct the fraction, look for common factors in the numerator and denominator, and cancel them. That is probably not what other systems do.

Which commands/instructions did you use in your benchmarks?

Vincent Delecroix

unread,

Jul 10, 2017, 7:31:26 AM7/10/17

to sage-...@googlegroups.com

On 10/07/2017 12:48, mmarco wrote:
> It is surprising the difference between singular and Sage, considering that
> Sage mostly relies on Singular for multivariate polynomial arithmetic. In
> the case of divisions, I suspect that it has to do with the fact that Sage
> treats division of polynomials as an operation in the fraction field, so it
> would construct the fraction, look for common factors in the numerator and
> denominator, and cancel them. That is probably not what other systems do.

Indeed, for (internal) division one should use // and not /

> Which commands/instructions did you use in your benchmarks?

BTW, it would be good to have them in the post!

mmarco

unread,

Jul 10, 2017, 7:34:30 AM7/10/17

to sage-devel

If you used quo_rem, beware that Sage only uses Singular if the coefficient ring is a field. So if you define your polynomials over QQ instead of ZZ you will get timings similar to those of Singular. In the case of ZZ, it will do so with some generic python implementation of division.

I guess we could rely on Singular also for the case of Integers.

mmarco

unread,

Jul 10, 2017, 7:43:46 AM7/10/17

to sage-devel

If you used quo_rem, beware that Sage only uses Singular if the coefficient ring is a field. So if you define your polynomials over QQ instead of ZZ you will get timings similar to those of Singular. In the case of ZZ, it will do so with some generic python implementation of division.

I guess we could rely on Singular also for the case of Integers.

El lunes, 10 de julio de 2017, 12:48:55 (UTC+2), mmarco escribió:

mmarco

unread,

Jul 10, 2017, 8:02:51 AM7/10/17

to sage-devel

This is now #23395

Ralf Stephan

unread,

Jul 10, 2017, 8:46:21 AM7/10/17

to sage-devel

On Monday, July 10, 2017 at 8:55:23 AM UTC+2, vdelecroix wrote:

He was certainly not using the awfully slow symbolic ring

sage: x,y,z,t,u = SR.var('x,y,z,t,u')
sage: f = (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^6
sage: g = (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^6
sage: %time h = (f*g).expand()
CPU times: user 8.99 s, sys: 25.7 ms, total: 9.01 s
Wall time: 9.01 s

This is the worng way to test symbolic multiplication, instead:

sage: f = ((1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^6).expand()

sage: g = ((1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^6).expand()

sage: %timeit _ = f*g

The slowest run took 9.98 times longer than the fastest. This could mean that an intermediate result is being cached.

1000 loops, best of 3: 1.04 ms per loop

Ralf Stephan

unread,

Jul 10, 2017, 8:51:51 AM7/10/17

to sage-devel

Sorry, I meant (f*g).expand(). It's even slower, interesting.

Bill Hart

unread,

Jul 10, 2017, 10:36:38 AM7/10/17

to sage-devel

It looks like I used quo_rem, since you need both the quotient and whether or not it is divisible, for which you need the remainder.

BTW, it would be good to have them in the post!

It already says in the post that for systems that didn't provide the required function, I used quotient with remainder.

Vincent Delecroix

unread,

Jul 10, 2017, 11:05:10 AM7/10/17

to sage-...@googlegroups.com

On 10/07/2017 16:36, 'Bill Hart' via sage-devel wrote:
>> BTW, it would be good to have them in the post!
>>
>
> It already says in the post that for systems that didn't provide the
> required function, I used quotient with remainder.

I meant code snippets.

Bill Hart

unread,

Jul 10, 2017, 12:31:22 PM7/10/17

to sage-devel

Switching to Singular for working over ZZ seems like a good idea. I timed it over ZZ and QQ in Singular and don't notice much difference in the timings.

I'm sure the following is obvious, but let me mention it just in case. In the blog post I explain that some systems do not explicitly provide arithmetic over Z (e.g. Giac, which internally optimises for Z, but works over Q from the point of view of the user). The answer to a general division over Z will not necessarily be the same as the answer over Q. I chose the specific benchmark examples so that this is actually the case, but it isn't true in general. This was the only way to give meaningful comparison with Giac. But this doesn't imply that one can always work over Q instead of Z of course!

Apparently, Factory does have code to work over Z instead of Q, which could be called directly (after explicit conversion, sanity checking, etc). But note that the code in Factory for working over Z certainly gets less of a workout than the code over Q, so caveat emptor.

Apparently, one comment in my blog about Singular calling Factory for one of the commands I was issuing, is incorrect. I will correct this in my blog. We traced it through, and in fact I was using Singular functions only.

Bill.

Bill Hart

unread,

Jul 10, 2017, 1:02:20 PM7/10/17

to sage-devel

I suppose that could be useful, but it's a lot of work for little clear benefit (5 systems, 6 benchmarks). The code is pretty much exactly what Ralph wrote, except that I used a for loop for the operation being timed, and timed that. This was the same in all the systems.

I used quo_rem for division with remainder (for the divisibility test with quotient benchmark) and * for multiplication. For quotient only, I don't recall if I used quo_rem or //. It's clear from the timings I didn't use /, and I do recall checking the documentation very carefully for that.

Simon King

unread,

Jul 11, 2017, 6:20:15 AM7/11/17

to sage-...@googlegroups.com

Hi,

On 2017-07-10, mmarco <mma...@unizar.es> wrote:
> It is surprising the difference between singular and Sage, considering that
> Sage mostly relies on Singular for multivariate polynomial arithmetic.

Note that Singular is optimised for Gröbner basis computations, but certainly
not optimised for fast arithmetics. So, it is absolutely no surprise that
polynomial arithmetics can be improved by using a different backend. However,
keep in mind that the other backend probably will be much slower than Singular
in doing Gröbner basis computations.

Best regards,
Simon

Bill Hart

unread,

Jul 11, 2017, 11:00:50 AM7/11/17

to sage-devel

That's absolutely correct, and a point I make in my blog. One heuristic is that GBs tend to have a large number of very small polynomials and so one can dispatch larger arithmetic operations to a different back end safely (converting to and from some other format on the fly). This is something Singular already does, I believe.

Of course there are also libraries that do GBs and still have basic arithmetic that is comparable to the timings I've given. This is definitely something we care about and will be working on in future. The main goal of the ODK work we are doing, as we see it, is to speed up Singular (by speeding up Factory).

We are also independently looking at parallelising GBs (in a totally different way, of course). So once this project is done, we may get the best of both worlds! It's a very interesting project that a number of people are having input into.

Johan S. H. Rosenkilde

unread,

Jul 11, 2017, 2:26:51 PM7/11/17

to sage-...@googlegroups.com

> That's absolutely correct, and a point I make in my blog. One heuristic is
> that GBs tend to have a large number of very small polynomials and so one
> can dispatch larger arithmetic operations to a different back end safely
> (converting to and from some other format on the fly). This is something
> Singular already does, I believe.

I guess for a system like Sage it makes sense to use a default backend
that optimises for fast arithmetic. A GB computation is something one
expects to be long, so it seems OK to take a small amount of time
converting to a GB-optimised backend whenever a GB-computation is
called.

Best,
Johan

Bill Hart

unread,

Jul 11, 2017, 9:06:29 PM7/11/17

to sage-devel, mail...@atuin.dk

I would personally agree with that. It doesn't make sense for Singular, since Singular is really dedicated to a particular problem domain where GBs are almost everything. But Sage is more general purpose.

People probably care more about multivariate gcd and factorisation than they do about basic arithmetic though, except for very small polynomials. Having slow arithmetic does make it hard to coopt it for other purposes though.

Bill.

Francesco Biscani

unread,

Jul 12, 2017, 6:16:33 AM7/12/17

to sage-...@googlegroups.com

Interesting timings, they give me some motivation to revisit the dense multiplication algorithm in piranha :)

As an aside (and apologies if this is a slight thread hijack?), I have been spending some time in the last few weeks decoupling the multiprecision arithmetic bits from piranha into its own project, called mp++:

https://github.com/bluescarni/mppp

So far I have extracted the integer and rational classes, and currently working on the real class (arbitrary precision FP).

Cheers,

Francesco.

--
-- William Stein

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscribe@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Bill Hart

unread,

Jul 12, 2017, 9:13:20 AM7/12/17

to sage-devel

Beware, Bernard Parisse has just helped me track down why the Flint timings for the sparse division only benchmark looked so ridiculously low. It turns out that due to an accident of interfacing between Nemo and Flint, it was using reflected lexicographical ordering instead of true lexicographical ordering. If I had labelled them "exact division", instead of "quotient only" and not included the x^(n - 3) term in the benchmark itself, the timings could be considered correct (though Giac would also have been able to do the computation much faster in that case). But unfortunately, this discovery means I had to change the timings for Flint for that benchmark. It is now correct on the blog.

The timings for mppp are really good. I'm surprised you beat the Flint timings there, since we use pretty sophisticated templating in our C++ interface. But clearly there are tricks we missed!

Bill.

Francesco Biscani

unread,

Jul 12, 2017, 10:00:16 AM7/12/17

to sage-...@googlegroups.com

In the benchmarks I use the C++ interfaces of FLINT and Boost.Multiprecision only for ease of initialization/destruction. The bulk of the operations is performed using directly the C API of FLINT and GMP. mp++ itself has some moderate template metaprogramming in place, but for instance it is currently lacking expression templates support (unlike fmpzxx), the focus at the moment being on fast low-level primitives (add/sub/mul/addmul etc.).

Cheers,

Francesco.

--

Bill Hart

unread,

Jul 13, 2017, 6:25:48 AM7/13/17

to sage-devel

So why is it faster than Flint say? Except for the overhead in the Flint fmpz type, which uses a single word initially and only upgrades to an mpz_t on overflow, it should currently be doing more allocations than Flint. And Flint should be faster for something like a dot product, especially if the integers are all small, since it never actually allocates mpz_t's in that case. What is the new innovation?

Bill.

To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Francesco Biscani

unread,

Jul 13, 2017, 2:46:21 PM7/13/17

to sage-...@googlegroups.com

mppp also uses a small value optimisation. The number of limbs that can be stored without dynamic memory allocation can be selected at compile time, and the benchmarks on the website are done using 1 limb (64 bits) of static storage.

I can think of a few things that might influence positively mppp's performance with respect to FLINT:

- inlining (as mppp is header-only) avoids the overhead of function calls and might allow the compiler to optimise better.

- mppp does not do automatic downsizing, once you are using dynamic storage it's up to you to demote to a static integer. I would imagine this saves a few branches with respect to FLINT.

- I spent a lot of time tinkering with the add/sub/mul code, trying to find the code flow/layout that would squeeze out the best performance for small operands. Maybe I just got lucky with a specific way of arranging the code particularly friendly to GCC, but I don't know really - I am not a low-level/assembly type of guy. I just tried many different variations and picked the one that performed better.

Cheers,

Francesco.

To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscribe@googlegroups.com.

Bill Hart

unread,

Jul 14, 2017, 12:07:42 PM7/14/17

to sage-devel

It seems like you've done a good job with it anyway. Thanks for the description.

Bill.

Reply all

Reply to author

Forward