Status of fmpz_mpoly_addmul

Brent W. Baccala

unread,

Jul 21, 2021, 1:37:45 AM7/21/21

to flint-devel

Hi -

Just wanted to give a quick status update on fmpz_mpoly_addmul_multi (adds up a sum of products of polynomials without storing intermediate values).

It's working pretty well. Passes all of its current test cases, but I know of a slight bug that doesn't fail any test cases because it doesn't break the results, it just don't optimize things the best way. Need to improve the code coverage, too.

I want to write some documentation and I figure that there's little point in making a PR until I do that since parts of the code will make no sense without explanation.

I also want to optimize it a bit more, and I have an optimization that won't just affect it but will also touch about 50 other files, so that will be a separate PR. (removing i and j from mpoly_heap_s)

Been working the last few days in interfacing it to Sage. It kinda works but needs a lot more effort. The FLINT routine is working much better than the Sage interface.

That's all. I'll probably fiddle with the Sage some more because that's my main focus, but I know I've got to circle back to writing some docs and getting a PR in.

agape

brent

Brent W. Baccala

unread,

Jul 30, 2021, 4:09:46 PM7/30/21

to flint-devel

I now have both the FLINT code and the Sage interface working fairly well. The Sage code now does a pretty reasonable job of handling sums of products of rational functions. It distributes over additions when possible, factors everything down to irreducibles, computes the LCM of the denominator, and calls FLINT's fmpz_mpoly_addmul_multi to compute the numerator, passing it a list of irreducible polynomials to be multiplied and summed.

The code is still single-threaded, though, so it can be a bit slow. On my main problem, it runs well with stable memory utilization (about a 65GB working set) , but I estimate that it would take about 10 days to run to completion, producing an output polynomial ~70 billion terms that should take about 1.5 TB of disk to store. I'm working on multi-threaded code and am hoping to get that number down to one day on my 12-core machine.

I'm still not sending in a PR, since I'm still working on the code, plus I doubt that anybody else is very interested in it right now. The current single-theaded code is available on the "trunk" branch of my github repository at git://github.com/BrentBaccala/flint2.git

agape

brent

Bill Hart

unread,

Aug 2, 2021, 6:41:34 AM8/2/21

to flint-devel

Great! Thanks for the update.

By the way, we are moving our Flint Friday's meeting two hours later
from now on, in case you want to drop in occasionally at a more
convenient time.

Bill.

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "flint-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to flint-devel...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/flint-devel/f1853d71-968b-4d65-9baa-4591a76748d5n%40googlegroups.com.

Brent W. Baccala

unread,

Aug 3, 2021, 12:52:30 PM8/3/21

to flint-devel

Bill -

OK, thanks for letting me know about FLINT Fridays!

not sure if I'll be there or not, we'll see...

agape

brent

Brent W. Baccala

unread,

Aug 19, 2021, 1:53:13 PM8/19/21

to flint-devel

I now have the addmul_multi code running multi-threaded, and have introduced an extra cancellation step that significantly reduces the complexity of the problem. All of the test cases pass, and they have pretty good code coverage, and valgrind runs clean.

Last night, a 12-core machine multiplied 440 polynomials spread over 40 terms and generated a 6870314768 term polynomial in 6.5 hours. The working set set was around 50 GB, but most of that was used for buffering. The problem might require as little as 8 GB of RAM, but that would probably slow things down because having lots of buffer space is crucial to running the processors at 100% utilization. The basic algorithm is to let each processor multiple the factors in a single term, writing into a buffer, then add up the terms by merging the buffers together.

I say "generated", because the machine didn't save the output at all. It just checked to make sure the exponents were in descending order, collected some statistics, and then discarded the results. The result needs to be processed further and saved to disk, but I haven't written that code yet.

To facilitate this, I've added an "output_function" field to the fmpz_mpoly structure. If this function pointer is NULL, the subroutine proceeds normally, writing the output into memory and resizing the arrays as needed, just like always. If "output_function" is not NULL, the routine calls it for each output polynomial term, and does nothing else.

I've currently got a single output_function that just checks exponent ordering and collects some statistics. I'm planning to write several more: one that writes out to disk, one that does a variable substitution, one that evaluates the polynomial at a given coordinate vector.

An "input_function" also seems to be needed, to deal with polynomials that were previously written to disk.

I'm not sure what the API should be, but this seems to be moving in the direction of having input and output functions for polynomials that allow a chain of calculations to be configured and then run as a block, stalling as needed to wait for buffers to fill rather than expecting everything to be in memory at once.

Obviously this only works for algorithms that can read or write polynomials sequentially, though this can be fudged a little bit. For example, I'm planning to write an output_function that does variable substitution by writing a series of intermediate files to disk, each one a sorted polynomial produced from a block of the input polynomial by a variable substitution, then adding them up at the end using a merge tree to produce the final result. The idea here is that I can write a portion of the input polynomial to a memory buffer (as it's being produced by addmul_multi), run the variable substitution when either that buffer fills up or when the substituted variables change (put them at the beginning of a lex ordering so they don't change too often), and dump the results straight to disk so no additional memory is required.

I've got the fmpz_mpoly_addmul_multi subroutine, that's probably close to being suitable for a PR, but I'm adding this other feature (output_function) that's going to be a whole new issue, and the code is starting to get a little mixed together, but it's not real bad (yet) because I can just cut out the blocks that start with "if (output_function)".

agape

brent

Bill Hart

unread,

Aug 20, 2021, 5:51:18 AM8/20/21

to flint-devel

It sounds like it is coming along well. It's probably best to wait
until you are completely happy with the interface before merging
anything, though feel free to open a WIP pull request so we can get an
early look at the code.

Sounds like it will be quite a sophisticated engine by the time it is done.

Bill.

> To view this discussion on the web, visit https://groups.google.com/d/msgid/flint-devel/1b7f663f-4147-4844-8521-03788d8a8850n%40googlegroups.com.

Brent W. Baccala

unread,

Aug 20, 2021, 7:15:17 PM8/20/21

to flint-devel

Bill -

OK, I opened a WIP pull request just now.

I am still changing the API around quite a bit, so I agree that it's best to wait.

Now anybody interested in the code can take a look at it.

I actually tuned in for FLINT Friday this morning, but the meeting didn't seem to be running. Had a few questions to run by you, specifically:

1. The addmul_multi_threaded routine will print status messages, which is useful for something that runs for hours. How should we set an option for a function to print status messages?

2. How is threading supposed to work? I currently call flint_set_num_threads(12). Is that necessary, or is the library supposed to detect how many processors are available?

Status of fmpz_mpoly_addmul_multi

Brent W. Baccala

Brent W. Baccala

Bill Hart

Brent W. Baccala

Brent W. Baccala

Bill Hart

Brent W. Baccala