[GSoC] Hermite normal form - Updates

224 views
Skip to first unread message

Alex Best

unread,
Jun 8, 2014, 6:04:27 PM6/8/14
to flint...@googlegroups.com
Hi all,

I've just published a blog post explaining a little about what I've been up to this week working on Hermite normal forms.

There is one particular issue I've been having that I was wondering if anyone had any advice for dealing with, when I run valgrind on the test program for one of the HNF implementations I've been working on (fmpz_mat_hnf_mod_D) I get a lot of errors, specifically I get 33 errors regarding lost blocks, starting with:

==30989== 88 (16 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 46
==30989== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30989== by 0x4E76C48: flint_malloc (memory_manager.c:54)
==30989== by 0x4E8A299: _fmpz_new_mpz (fmpz.c:49)
==30989== by 0x4E8A484: _fmpz_promote (fmpz.c:95)
==30989== by 0x4E8FD59: fmpz_set (set.c:44)
==30989== by 0x4E99D0A: _fmpz_vec_set (set.c:36)
==30989== by 0x4EF311D: fmpz_mat_set (set.c:38)
==30989== by 0x4EE4907: fmpz_mat_det_bareiss (det_bareiss.c:54)
==30989== by 0x400B5C: main (t-hnf_mod_D.c:121)

I'm having a lot of trouble working out where this is coming from, line 121 is simply computing the determinant of a matrix, which is then doubled, absolute valued and then used to find the Hermite normal form of the matrix (by working modulo that number).
When I've tried to isolate a specific case where this happens (the test code does 1000 iterations) I get no errors! (or I'm not isolating the right case) which leaves me very confused. Does anyone have any ideas how I can go about dealing with this? I should note that while the procedure is not yet fully finished all the tests do pass so it looks like there no wrong answers are produced by this problem.

Best wishes,
Alex

Fredrik Johansson

unread,
Jun 8, 2014, 7:32:53 PM6/8/14
to flint-devel
Hi Alex,

Does the patch https://github.com/fredrik-johansson/flint2/commit/2934f95cda7af47184c4ed568e25ce157c07c90a
fix the problem for you?

Fredrik

Bill Hart

unread,
Jun 8, 2014, 7:36:51 PM6/8/14
to flint-devel
I think valgrind pretty much always considers a memory leak involving fmpz's as memory lost, since it doesn't understand how we mangle pointers. So it's likely a memory leak, internal to flint.

It sounds as if Fredrik has found one which probably affects anything where a modular inverse with an fmpz modulus is involved, which would include CRT and the like.

Bill.


--

---
You received this message because you are subscribed to the Google Groups "flint-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flint-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Best

unread,
Jun 8, 2014, 9:04:41 PM6/8/14
to flint...@googlegroups.com
Hi Fredrik,

Yes it does, thanks! I'll try and take a look for that sort of thing in future.

Alex

Alex Best

unread,
Jun 16, 2014, 6:23:38 PM6/16/14
to flint...@googlegroups.com
Hi everyone,

I just posted another blog post saying a little about what I've been working on this week, to save clicking the body of the post is below:

This week I’ve been working more on some of the older approaches to computing the HNF. First I finished off the modulo determinant method which works well for square matrices of full rank. One of the issues with this method however is that the determinant needs to be calculated first and can be quite large, in which case there may be little gained by calculating modulo it.

I’ve also implemented the algorithm described by Kannan and Bachem which takes the principal minors to Hermite form one at a time, as opposed to other methods I’ve implemented so far which work column by column. This also provides significant improvement on the classical method, and indeed is performing better than the modular approach in many cases.

For the next week I’ll be trying to optimise the algorithms implemented so far as much as possible, especially the modular approach of Domich, Kannan and Trotter as this plays a key role in the algorithm of Pernet and Stein. One thing I would like to try is combining the two approaches I worked on this week and making a modulo determinant version of the Kannan-Bachem algorithm. There is also the possibility that factoring the determinant into some relatively prime parts and working modulo each of those, before recombining at the end with the Chinese remainder theorem, of course this involves factoring the determinant which could make the whole procedure less worthwhile. I also need to make the necessary modifications to ensure that all algorithms work in the non-square case as efficiently as possible.

Alex

Bill Hart

unread,
Jun 16, 2014, 8:10:55 PM6/16/14
to flint-devel
Thanks for the update Alex. It looks like things are progressing very well. It's really exciting to have these algorithms implemented in flint. 

Factoring the determinant could be ok if it fits in a word, say. Otherwise it is going to take too long. And we also only have a factor function in flint for word sized numbers.

Now I guess you appreciate Loic Grenie's joke about the "polynomial time factoring algorithm". However, not all is lost. I know roughly how such an algorithm works. There are three stages, two of which I recall: 1) you supply an integer n to be factored (preferably composite), 2) ??, 3) the algorithm returns a factor on n. It's very similar to some of the factoring algorithms implemented in flint.

Bill.

Alex Best

unread,
Jun 22, 2014, 9:52:46 PM6/22/14
to flint...@googlegroups.com, goodwi...@googlemail.com
Hi all,

I wrote a new blog post, text below:

Week 5 of my GSoC project is now over and this week I’ve been trying some variations of the older approaches to computing the Hermite normal form that I implemented in the first few weeks, I’ve had some success extending the Kannan-Bachem approach to non-square matrices and implementing other practical speed-ups. However my main project for this week, working on a variant of the modular approach for use when factors of the determinant are known, hasn’t worked out so well yet. I’ve been really struggling to make it work as intended and have decided to place this to the side for the moment and come back to it later as it is not vital to the rest of the project. The main use case would be if the determinant was larger than word sized but a factorisation into relatively prime factors all was known where each factor fits in a word, here there would probably be an advantage to using multiple matrices modulo these factors. If however one of the factors was larger than word size or indeed the determinant itself fits in a single word I can’t see how using such a factorisation would help much (if at all), there is also the problem of obtaining the factors in the first place, certainly some methods of computing the determinant (such as the method described by Abbot, Bronstein and Mulders) can give some factors of the determinant for free when computing it but this is certainly not guaranteed to give us useful information in light of the above. Nevertheless I would like to try and finish implementing this at some point, but don’t want to slow the whole project down just working on this.

Next week I’ll begin working on the algorithm given by Pernet and Stein which will take some time to do right and so I expect to be working on this in some way or another for at the whole week (at least!). Initially I’ll get a simplified version working (which will likely end up looking like the algorithm of Micciancio and Warinschi) and then I’ll start working on the improvements suggested in Pernet and Stein’s paper. Hopefully when this is done it will be extremely fast compared to the algorithms I have worked on so far, however it’s run time will depend on what I can squeeze out of the modular approach I worked on a couple of weeks ago.

Alex

Денис Крыськов

unread,
Jun 23, 2014, 6:39:28 AM6/23/14
to flint...@googlegroups.com, goodwi...@googlemail.com

Next week I’ll begin working on the algorithm given by Pernet and Stein which will take some time to do right and so I expect to be working on this in some way or another for at the whole week (at least!). Initially I’ll get a simplified version working (which will likely end up looking like the algorithm of Micciancio and Warinschi) and then I’ll start working on the improvements suggested in Pernet and Stein’s paper. Hopefully when this is done it will be extremely fast compared to the algorithms I have worked on so far, however it’s run time will depend on what I can squeeze out of the modular approach I worked on a couple of weeks ago.

Good luck rewriting in C the fastest in practice HNF algorithm.

I think it will be difficult to outperform Sage square matrix HNF calculation, because Sage does some tasks better than FLINT.

Fredrik Johansson

unread,
Jun 23, 2014, 9:04:00 AM6/23/14
to flint-devel
Agreed. It would be very interesting to see how this algorithm works
out, but it shouldn't be the first priority right now.

> Next week I’ll begin working on the algorithm given by Pernet and Stein
> which will take some time to do right and so I expect to be working on this
> in some way or another for at the whole week (at least!). Initially I’ll get
> a simplified version working (which will likely end up looking like the
> algorithm of Micciancio and Warinschi) and then I’ll start working on the
> improvements suggested in Pernet and Stein’s paper. Hopefully when this is
> done it will be extremely fast compared to the algorithms I have worked on
> so far, however it’s run time will depend on what I can squeeze out of the
> modular approach I worked on a couple of weeks ago.

That sounds like a good plan. Just ask if you have any problems with
the solving/p-adic lifting code.

Fredrik

Bill Hart

unread,
Jun 23, 2014, 9:25:55 AM6/23/14
to Денис Крыськов, flint-devel
I guess you are referring to the use of BLAS.

One day we'll have BLAS support in flint.

Bill.

Fredrik Johansson

unread,
Jun 23, 2014, 9:29:50 AM6/23/14
to flint-devel
Sage uses IML and Linbox (plus BLAS). IML is written in C and
GPLv2-licensed, so if we really wanted to, we could steal any code we
wanted from there (but figuring the exact tricks they use and
reimplementing those is probably not much more work). It's unfortunate
that we didn't get the BLAS GSoC project, as that would allow speeding
up the mod p linear algebra significantly with huge matrices.
Nonetheless, the last time I benchmarked, FLINT computed determinants
over Z faster than Sage for matrices up to at least size 1000. If the
bottleneck in the Pernet-Stein algorithm is the double determinant
calculation, then we can expect similar performance for HNF.

Fredrik

Денис Крыськов

unread,
Jun 25, 2014, 2:45:09 AM6/25/14
to flint...@googlegroups.com

Nonetheless, the last time I benchmarked, FLINT computed determinants
over Z faster than Sage for matrices up to at least size 1000. If the
bottleneck in the Pernet-Stein algorithm is the double determinant
calculation, then we can expect similar performance for HNF.

Fredrik

Yes, double-det is the main dish, when dimension is big
********* profile for n=3000 bits=8 tries=5 *********
                      sswdlr 0 4.9926
                     add_row-0 24.9032
                     add_row-1 31.3601
                      sswdlr 1 52.7402
                      sswdlr 2 69.1908
                 small_det_HNF 86.5797
                solve_right-dd 478.8966
                        kernel 749.3706
                dixon-last_row 792.4505
system_with_difficult_last_row 1668.7449
             det_given_divisor 8403.0775
                    double_det 8902.6383
                         total 10742.5848
*****************************************************
 

Alex Best

unread,
Jul 2, 2014, 4:32:57 PM7/2/14
to flint...@googlegroups.com
Hi everyone,

I've been working on the Pernet-Stein algorithm recently and wrote a blog post explaining a little about it. Text below as usual.

This week I’ve been working on implementing Pernet and Stein’s algorithm to compute the HNF. The basic idea of this method (which is in turn due to Micciancio and Warinschi) is to use the given matrix to construct another axillary matrix with small determinant (so the HNF of the new matrix can be computed without coefficient explosion) from which the original HNF can be reconstructed. The algorithm splits in to a few separate stages, the computation of two determinants, the modular HNF of the smaller matrix, the addition of a column to the HNF and then the addition of two rows. Each of these steps can be optimised quite a bit using some clever tricks, which I’m currently working on doing. The adding of rows and columns to the HNF now seems to work well, and exploiting the similarities of the submatrices whose determinants are needed to compute them both faster should be done tomorrow. At that point it should be interesting to take stock and see how well the Pernet-Stein approach is working compared to previous methods I’ve worked on. I’ll likely spend a couple more days after that improving this approach even more.

This is definitely the most complex algorithm I’ve worked on so far, but when completed it should really blow the others away for large enough random matrices.

Alex Best

unread,
Jul 6, 2014, 10:37:09 PM7/6/14
to flint...@googlegroups.com
Hello,

I've been working on Pernet and Stein's approach some more this week (blog post, text below).

I have one question that I'm hoping someone could shed some light on: I have a function that does some computation using a temporary fmpz and then sets the output fmpz using the temporary value, but when I try and remove the temporary variable from the function working out my examples suddenly gets vastly slower! The relevant lines read:
fmpz_mul_ui(tmp, m1, s);
fmpz_add(out, tmp, r1normal);
which I replaced with (what in my mind the pretty much equivalent) lines
fmpz_mul_ui(out, m1, s);
fmpz_add(out, out, r1normal);
my best guess is that this is some sort of cache effect (or I'm actually doing a different computation?), but I really don't know.
In case anyone is wondering this is for a slightly modified fmpz_CRT_ui_precomp function which allows half the product of the two moduli to be precomputed also and I've found speeds up functions like fmpz_mat_CRT_ui a fair amount.

This week I've been working on improving the algorithm of Pernet-Stein as much as possible. After introducing as many of the ideas given in the original paper as I could I found the main bottleneck to actually be the computation of the nullspace of a certain submatrix of the matrix given, this is needed in order to efficiently solve a linear system which (likely) has a nasty final row. If we know a basis for a nullspace of the first n-1 rows of the system we can replace the final row with a random nice (having small entries) row and then find a solution to the original system by adding on a suitable multiple of the nullspace basis vector (the nullspace should be 1 dimensional for random input).
Flint uses the reduced row echelon form of a matrix to compute nullspaces (the nullspace of a matrix in this form can be easily read off and the transformation does not alter it) and so a natural place to improve nullspace computation is to improve the row echelon form computation. We can use a multimodular approach for this problem (this is described in Stein's book on computing with modular forms) and I've achieved some nice speed-ups with this method in the past couple of days. For example the multimodular method is around 200x faster for 125x125 matrices with 32 bit entries. While this has made Hermite form computations a lot faster (nullspace computation is over 90% of the work for large matrices) I still want to try and see if this can be improved upon further, after all, in this case we don't need the whole row echelon form just a vector in the kernel and the knowledge that the nullspace is 1-dimensional. So I plan to work on this further in the next couple of days, and depending on how I feel about this approach I will either spend the rest of this week making improvements to Pernet-Stein or possibly work on implementing the algorithm of Pauderis and Storjohann.

Alex

Fredrik Johansson

unread,
Jul 6, 2014, 11:36:43 PM7/6/14
to flint-devel
On Mon, Jul 7, 2014 at 4:37 AM, Alex Best <alex....@gmail.com> wrote:
> Hello,
>
> I've been working on Pernet and Stein's approach some more this week (blog
> post, text below).
>
> I have one question that I'm hoping someone could shed some light on: I have
> a function that does some computation using a temporary fmpz and then sets
> the output fmpz using the temporary value, but when I try and remove the
> temporary variable from the function working out my examples suddenly gets
> vastly slower! The relevant lines read:
> fmpz_mul_ui(tmp, m1, s);
> fmpz_add(out, tmp, r1normal);
> which I replaced with (what in my mind the pretty much equivalent) lines
> fmpz_mul_ui(out, m1, s);
> fmpz_add(out, out, r1normal);
> my best guess is that this is some sort of cache effect (or I'm actually
> doing a different computation?), but I really don't know.

Not sure if this is what happens here, but it's expensive to demote or
promote an fmpz. If one value usually is smaller than 2^62 and the
other usually is larger than 2^62, it might be faster to use two
separate fmpzs. It could be something else entirely though.

> In case anyone is wondering this is for a slightly modified
> fmpz_CRT_ui_precomp function which allows half the product of the two moduli
> to be precomputed also and I've found speeds up functions like
> fmpz_mat_CRT_ui a fair amount.

Nice. This should speed up fmpz_poly_gcd_modular (among others) a bit too.

Can you think of any improvements to the multi-CRT code? (See my
flint-devel post about Taylor shifts.)
Sounds awesome. Non-stupid rref/nullspace over Z will be a huge
addition. Keep up the good work!

Fredrik

Bill Hart

unread,
Jul 7, 2014, 6:16:55 AM7/7/14
to flint-devel
On 7 July 2014 04:37, Alex Best <alex....@gmail.com> wrote:
Hello,

I've been working on Pernet and Stein's approach some more this week (blog post, text below).

I have one question that I'm hoping someone could shed some light on: I have a function that does some computation using a temporary fmpz and then sets the output fmpz using the temporary value, but when I try and remove the temporary variable from the function working out my examples suddenly gets vastly slower! The relevant lines read:
fmpz_mul_ui(tmp, m1, s);
fmpz_add(out, tmp, r1normal);
which I replaced with (what in my mind the pretty much equivalent) lines
fmpz_mul_ui(out, m1, s);
fmpz_add(out, out, r1normal);
my best guess is that this is some sort of cache effect (or I'm actually doing a different computation?), but I really don't know.

This is probably due to memory management. When you allocate the temp yourself it doesn't have to copy the data into a new temporary it makes for you. Each function can only see what it is doing, not what else you are going to do, so it can't tell the best temporary strategy.

Of course there could easily be some other issue we are missing. But if the amount of data is very large, then this could be it.
 
In case anyone is wondering this is for a slightly modified fmpz_CRT_ui_precomp function which allows half the product of the two moduli to be precomputed also and I've found speeds up functions like fmpz_mat_CRT_ui a fair amount.

This week I've been working on improving the algorithm of Pernet-Stein as much as possible. After introducing as many of the ideas given in the original paper as I could I found the main bottleneck to actually be the computation of the nullspace of a certain submatrix of the matrix given, this is needed in order to efficiently solve a linear system which (likely) has a nasty final row. If we know a basis for a nullspace of the first n-1 rows of the system we can replace the final row with a random nice (having small entries) row and then find a solution to the original system by adding on a suitable multiple of the nullspace basis vector (the nullspace should be 1 dimensional for random input).
Flint uses the reduced row echelon form of a matrix to compute nullspaces (the nullspace of a matrix in this form can be easily read off and the transformation does not alter it) and so a natural place to improve nullspace computation is to improve the row echelon form computation. We can use a multimodular approach for this problem (this is described in Stein's book on computing with modular forms) and I've achieved some nice speed-ups with this method in the past couple of days. For example the multimodular method is around 200x faster for 125x125 matrices with 32 bit entries. While this has made Hermite form computations a lot faster (nullspace computation is over 90% of the work for large matrices) I still want to try and see if this can be improved upon further, after all, in this case we don't need the whole row echelon form just a vector in the kernel and the knowledge that the nullspace is 1-dimensional.

There are surely algorithms for this. The row echelon implementation we have is pretty simple minded I guess.
 
So I plan to work on this further in the next couple of days, and depending on how I feel about this approach I will either spend the rest of this week making improvements to Pernet-Stein or possibly work on implementing the algorithm of Pauderis and Storjohann.

Alex


On Wednesday, July 2, 2014 9:32:57 PM UTC+1, Alex Best wrote:
Hi everyone,

I've been working on the Pernet-Stein algorithm recently and wrote a blog post explaining a little about it. Text below as usual.

This week I’ve been working on implementing Pernet and Stein’s algorithm to compute the HNF. The basic idea of this method (which is in turn due to Micciancio and Warinschi) is to use the given matrix to construct another axillary matrix with small determinant (so the HNF of the new matrix can be computed without coefficient explosion) from which the original HNF can be reconstructed. The algorithm splits in to a few separate stages, the computation of two determinants, the modular HNF of the smaller matrix, the addition of a column to the HNF and then the addition of two rows. Each of these steps can be optimised quite a bit using some clever tricks, which I’m currently working on doing. The adding of rows and columns to the HNF now seems to work well, and exploiting the similarities of the submatrices whose determinants are needed to compute them both faster should be done tomorrow. At that point it should be interesting to take stock and see how well the Pernet-Stein approach is working compared to previous methods I’ve worked on. I’ll likely spend a couple more days after that improving this approach even more.

This is definitely the most complex algorithm I’ve worked on so far, but when completed it should really blow the others away for large enough random matrices.

--

Alex Best

unread,
Jul 14, 2014, 11:51:19 PM7/14/14
to flint...@googlegroups.com
Hi everyone,

Thanks for the replies.
I just published my latest blog post, text below.

This week I’ve been working on improving the computation of the nullspace of an integer by implementing a p-adic lifting based method. The algorithm is also described in Stein’s book on computing with modular forms and is closely related to Dixon’s p-adic method for linear system solving. This is pretty much finished now (modulo some weird memory leaks and determining the best values for parameters such as p-adic precision and range of primes used) and does appear to be asymptotically faster than the multimodular algorithm I worked on before, though experiments suggest that currently the matrix needs to be fairly large before the p-adic algorithm becomes better.

I was originally thinking of implementing Pauderis and Storjohann’s algorithm if I had time this week, but have spent much of the past week looking at efficient nullspace computation (and being plagued a little by unexpected power cuts!) so haven’t got much further than a skeleton implementation. Maybe I can flesh this out at some point but for the next week I’m going to move on to algorithms for the Smith normal form, another important normal form for matrices over the integers, where both column and row operations are allowed.

Hopefully building on what I’ve done so far I should be able to get some fast Smith normal form algorithms implemented shortly, indeed one simple approach is to repeatedly transpose and compute Hermite normal forms until the resulting matrix is diagonal, this won’t necessarily be the Smith form but it can then be computed with relative ease. Something more sophisticated than this will undoubtedly be best, but many SNF algorithms involve computing the HNF at some point and working from there so the current HNF algorithms provide a good basis for those computations.

Fredrik Johansson

unread,
Jul 15, 2014, 4:55:50 AM7/15/14
to flint-devel
On Tue, Jul 15, 2014 at 5:51 AM, Alex Best <alex....@gmail.com> wrote:
> Hi everyone,
>
> Thanks for the replies.
> I just published my latest blog post, text below.
>
> This week I’ve been working on improving the computation of the nullspace of
> an integer by implementing a p-adic lifting based method. The algorithm is
> also described in Stein’s book on computing with modular forms and is
> closely related to Dixon’s p-adic method for linear system solving. This is
> pretty much finished now (modulo some weird memory leaks and determining the
> best values for parameters such as p-adic precision and range of primes
> used) and does appear to be asymptotically faster than the multimodular
> algorithm I worked on before, though experiments suggest that currently the
> matrix needs to be fairly large before the p-adic algorithm becomes better.

It's not surprising that the matrix needs to be fairly large. Very
nice that you've gotten it to work.

Consider including an internal function that takes the size of primes
as a parameter. This simplifies profiling, because you can just put in
a runtime loop that times it with primes of all sizes. It could also
be used for testing. With default ~64-bit primes only, it's quite
likely that the test program will go through thousands of iterations
without ever exercising the code that deals with hitting a bad prime.
(We should do this for some of the old functions too.)

> I was originally thinking of implementing Pauderis and Storjohann’s
> algorithm if I had time this week, but have spent much of the past week
> looking at efficient nullspace computation (and being plagued a little by
> unexpected power cuts!) so haven’t got much further than a skeleton
> implementation. Maybe I can flesh this out at some point but for the next
> week I’m going to move on to algorithms for the Smith normal form, another
> important normal form for matrices over the integers, where both column and
> row operations are allowed.

Seems that you're well on schedule -- efficient nullspace computation
alone would have been a respectable GSoC project! I really hope you'll
be able to implement Pauderis and Storjohann’s algorithm too, but
there's no need to rush.

Just remember to push your code to github (better not risk unexpected
power cuts resulting in hard drive malfunction...)

> Hopefully building on what I’ve done so far I should be able to get some
> fast Smith normal form algorithms implemented shortly, indeed one simple
> approach is to repeatedly transpose and compute Hermite normal forms until
> the resulting matrix is diagonal, this won’t necessarily be the Smith form
> but it can then be computed with relative ease. Something more sophisticated
> than this will undoubtedly be best, but many SNF algorithms involve
> computing the HNF at some point and working from there so the current HNF
> algorithms provide a good basis for those computations.

Sounds good.

Fredrik

Alex Best

unread,
Jul 28, 2014, 12:03:39 PM7/28/14
to flint...@googlegroups.com
Hello everyone,

I just noticed I forgot to post an update here last week, sorry about that, the link to last week's post is here.
As for this week the link is here, and the text is below:

This week I worked more on the Smith normal form algorithm I talked about last week. I implemented Iliopoulos’ algorithm for computation of the Smith form modulo an arbitrary integer, this procedure is used in a couple of places as part of Saunders and Wan’s “engineered” algorithm. Firstly we use a prime power modulus to find the Smith normal form locally for small primes (i.e. those less than 100), the modular approach is also used for the rough part (concerning all the other primes) when the largest non-zero invariant factor is small compared to the size of the matrix. This algorithm is now working correctly, though the question of how to test it properly given its Monte Carlo nature is one that will require some thought. Currently whenever I have encountered a matrix for which the output of Saunders and Wan doesn’t match the deterministic algorithms’ outputs it has turned out to be a code bug rather than a side effect of the algorithm’s randomness. I suppose allowing for a reasonable number of failures beyond the expected number in test code would be one approach, but of course there will still be occasions when the number of failures exceeds this and allowing a large number of extra failures could allow real bugs to creep in.

For the next couple of days I’m going to work a little more on Smith forms, hopefully implementing Storjohann’s banded method for finding Smith forms of upper triangular matrices, this could be coupled with a HNF algorithm to give another (deterministic!) algorithm for the Smith form of a general matrix. I also need to clean up the Saunders and Wan code a bit as there are still a number of inefficiencies. I have not got a valence method included in this algorithm as this would require implementing a few other things (such as minimal polynomials), but the option is certainly there and it would easily slot in to the current code.

Alex

Bill Hart

unread,
Jul 28, 2014, 2:12:00 PM7/28/14
to flint-devel
On 28 July 2014 18:03, Alex Best <alex....@gmail.com> wrote:
Hello everyone,

I just noticed I forgot to post an update here last week, sorry about that, the link to last week's post is here.
As for this week the link is here, and the text is below:

This week I worked more on the Smith normal form algorithm I talked about last week. I implemented Iliopoulos’ algorithm for computation of the Smith form modulo an arbitrary integer, this procedure is used in a couple of places as part of Saunders and Wan’s “engineered” algorithm. Firstly we use a prime power modulus to find the Smith normal form locally for small primes (i.e. those less than 100), the modular approach is also used for the rough part (concerning all the other primes) when the largest non-zero invariant factor is small compared to the size of the matrix. This algorithm is now working correctly, though the question of how to test it properly given its Monte Carlo nature is one that will require some thought. Currently whenever I have encountered a matrix for which the output of Saunders and Wan doesn’t match the deterministic algorithms’ outputs it has turned out to be a code bug rather than a side effect of the algorithm’s randomness. I suppose allowing for a reasonable number of failures beyond the expected number in test code would be one approach, but of course there will still be occasions when the number of failures exceeds this and allowing a large number of extra failures could allow real bugs to creep in.

It's not a big issue since the code uses only a pseudorandom generator, which produces different values only between a 32 and 64 bit machine. So in theory, if it passes on one of each, it will always pass.

We do have a few cases where a small percentage of failures is allowed, but this is just paranoia on our behalf.

What is the expected failure rate of the algorithm by the way? Many of those sorts of algorithms are not expected to fail on uniformly random data in any universe in anyone's lifetime. E.g. a failure rate of 1 in 10^80 is not out of the question. Which means that if you have a failure, it might actually be a bug.

I'm not familiar with this algorithm, so I don't personally know.
 

--

Bill Hart

unread,
Jul 28, 2014, 2:15:58 PM7/28/14
to flint-devel
By the way, was there a Week 8 update?

Did you find any additional speedups to the nullspace computation or to Pernet-Stein in general, that you mentioned.

Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)

Alex Best

unread,
Jul 29, 2014, 2:38:52 PM7/29/14
to flint...@googlegroups.com, goodwi...@googlemail.com
>By the way, was there a Week 8 update?
Yes there was, it should be a few posts above this, here is a link though.


>Did you find any additional speedups to the nullspace computation or to Pernet-Stein in general, that you mentioned.
I did a few things (see the week 8 post), Fredrik thinks (and I agree) that the p-adic nullspace can definitely be sped up somehow, but I haven't found out exactly how yet.


>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.
Later this week I plan to do a lot of profiling etc. to write general hnf/snf functions that choose implementation based on input size so I'll compare against Sage (&LinBox, Pari?) then and report back.

>What is the expected failure rate of the algorithm by the way? Many of those sorts of algorithms are not expected to fail on uniformly random data in any universe in anyone's lifetime. E.g. a failure rate of 1 in 10^80 is not out of the question.
The failure rate is, unfortunately, not galactically small, but is instead controllable, it just takes longer to be more sure of the answer. So it's probably a matter of letting the user decide, with some sensible default.

Bill Hart

unread,
Jul 30, 2014, 7:21:33 AM7/30/14
to Alex Best, flint-devel
On 29 July 2014 20:38, Alex Best <alex....@gmail.com> wrote:
>By the way, was there a Week 8 update?
Yes there was, it should be a few posts above this, here is a link though.


Thanks. I went searching for it and couldn't find it, and yet, there it is!
 

>Did you find any additional speedups to the nullspace computation or to Pernet-Stein in general, that you mentioned.
I did a few things (see the week 8 post), Fredrik thinks (and I agree) that the p-adic nullspace can definitely be sped up somehow, but I haven't found out exactly how yet.

Cool. 

>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.
Later this week I plan to do a lot of profiling etc. to write general hnf/snf functions that choose implementation based on input size so I'll compare against Sage (&LinBox, Pari?) then and report back.

That will be very interesting, and that comparison already sounds pretty good. I'd have expected a greater difference than a mere factor of 2.
 

>What is the expected failure rate of the algorithm by the way? Many of those sorts of algorithms are not expected to fail on uniformly random data in any universe in anyone's lifetime. E.g. a failure rate of 1 in 10^80 is not out of the question.
The failure rate is, unfortunately, not galactically small, but is instead controllable, it just takes longer to be more sure of the answer. So it's probably a matter of letting the user decide, with some sensible default.

I see. Still sounds quite useful though.

Bill.

Alex Best

unread,
Aug 5, 2014, 5:59:07 PM8/5/14
to flint...@googlegroups.com
Hi everyone,

My latest blog post. Test below, but first a subtle bug I thought it might be worth quickly reporting here in case it turned out to be causing anyone else problems. I noticed that fmpz_xgcd(g, a, b, 0, -1) gives g = -1 as the result, a fix is now on my Github (someone may want to ensure this is the correct fix! I also reindented that file as per the code conventions while I was at it. Oh and the commit after corrects the number of tests as I left the wrong number in the file).

This week I’ve worked on a variety of different things, unfortunately getting hung up on most of them!

First I found some bugs in my Saunders-Wan Smith normal form code, most of them ended up being fixable, however one still eludes me. It seems the bug is fairly rare (occurs for roughly 1 in 10000 test cases) and it is certainly related to the random nature of the algorithm but my current thinking is that the current behaviour should not be happening even when we get unlucky with the randomness. I’ve left this issue alone for now after spending a couple of days making no progress on it.

I then moved on to writing some general HNF/SNF methods that pick between available implementations based on matrix size, norm and anything else that I could work out as being relevant. While doing this I found that the Kannan-Bachem Hermite form method worked better than the modular method some of the time and so I decided to try and combine them into a modular method that works by taking minors into HNF rather than rows. I had some problems making this work correctly when the input was not of full rank and all the fixes I tried ended up having some corner case that made them fail. It just occurred to me however that when the modular method is used as part of the Pernet-Stein algorithm the matrix given is always square and so this was not perhaps a completely wasted effort.

I then moved back to working on Pernet-Stein, specifically I added capacity for non-full row rank matrices, and fully general matrices should be done very soon.

This week I’m going to be doing some profiling and comparisons between my implementations and existing ones and try and work out the reasons why certain results are as they are and how the HNF/SNF methods I now have can be made faster in the future. It should be helpful to have a note of the barriers to having faster HNF/SNF computation in Flint and how they could be overcome.
Of course I’ll also be tidying up and documenting several bits as I go along to fill in any gaps in functionality I have left along the way.

Alex


On Sunday, June 8, 2014 11:04:27 PM UTC+1, Alex Best wrote:

Bill Hart

unread,
Aug 5, 2014, 7:00:27 PM8/5/14
to flint-devel
On 5 August 2014 23:59, Alex Best <alex....@gmail.com> wrote:
Hi everyone,

My latest blog post. Test below, but first a subtle bug I thought it might be worth quickly reporting here in case it turned out to be causing anyone else problems. I noticed that fmpz_xgcd(g, a, b, 0, -1) gives g = -1 as the result, a fix is now on my Github (someone may want to ensure this is the correct fix! I also reindented that file as per the code conventions while I was at it. Oh and the commit after corrects the number of tests as I left the wrong number in the file).

Thanks. This is presumably a corner case when one of the inputs is 0. The only thing to get right here is the sign of the cofactor I guess.

It's a shame our test code didn't pick this up. It looks like a case that should be hit multiple times if the test code was working as expected.
 

This week I’ve worked on a variety of different things, unfortunately getting hung up on most of them!

First I found some bugs in my Saunders-Wan Smith normal form code, most of them ended up being fixable, however one still eludes me. It seems the bug is fairly rare (occurs for roughly 1 in 10000 test cases) and it is certainly related to the random nature of the algorithm but my current thinking is that the current behaviour should not be happening even when we get unlucky with the randomness. I’ve left this issue alone for now after spending a couple of days making no progress on it.

Sounds very frustrating. I hope you manage to sort it out.
 

I then moved on to writing some general HNF/SNF methods that pick between available implementations based on matrix size, norm and anything else that I could work out as being relevant. While doing this I found that the Kannan-Bachem Hermite form method worked better than the modular method some of the time and so I decided to try and combine them into a modular method that works by taking minors into HNF rather than rows. I had some problems making this work correctly when the input was not of full rank and all the fixes I tried ended up having some corner case that made them fail. It just occurred to me however that when the modular method is used as part of the Pernet-Stein algorithm the matrix given is always square and so this was not perhaps a completely wasted effort.

I then moved back to working on Pernet-Stein, specifically I added capacity for non-full row rank matrices, and fully general matrices should be done very soon.

Great.
 

This week I’m going to be doing some profiling and comparisons between my implementations and existing ones and try and work out the reasons why certain results are as they are and how the HNF/SNF methods I now have can be made faster in the future. It should be helpful to have a note of the barriers to having faster HNF/SNF computation in Flint and how they could be overcome.

I'm very much looking forward to that. Please make use of the todo.txt file in the top level directory to add any ideas for future improvements and projects.
 
Of course I’ll also be tidying up and documenting several bits as I go along to fill in any gaps in functionality I have left along the way.

Sounds excellent. Let me know when you have something ready for review and merging and we'll try to get it into flint early. Google like that and it's obviously the best way for your code to get used and not bit rot.

You might have to remove the algorithms that are not working for now. It's better to get the stuff in that is working and leave the stuff that isn't working for a rainy day than to delay getting everything else in. We want to maximise usage of your code and efforts, obviously.

Thanks for your hard work.

Bill.
 


Alex


On Sunday, June 8, 2014 11:04:27 PM UTC+1, Alex Best wrote:
Hi all,

I've just published a blog post explaining a little about what I've been up to this week working on Hermite normal forms.

There is one particular issue I've been having that I was wondering if anyone had any advice for dealing with, when I run valgrind on the test program for one of the HNF implementations I've been working on (fmpz_mat_hnf_mod_D) I get a lot of errors, specifically I get 33 errors regarding lost blocks, starting with:

==30989== 88 (16 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 46
==30989== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30989== by 0x4E76C48: flint_malloc (memory_manager.c:54)
==30989== by 0x4E8A299: _fmpz_new_mpz (fmpz.c:49)
==30989== by 0x4E8A484: _fmpz_promote (fmpz.c:95)
==30989== by 0x4E8FD59: fmpz_set (set.c:44)
==30989== by 0x4E99D0A: _fmpz_vec_set (set.c:36)
==30989== by 0x4EF311D: fmpz_mat_set (set.c:38)
==30989== by 0x4EE4907: fmpz_mat_det_bareiss (det_bareiss.c:54)
==30989== by 0x400B5C: main (t-hnf_mod_D.c:121)

I'm having a lot of trouble working out where this is coming from, line 121 is simply computing the determinant of a matrix, which is then doubled, absolute valued and then used to find the Hermite normal form of the matrix (by working modulo that number).
When I've tried to isolate a specific case where this happens (the test code does 1000 iterations) I get no errors! (or I'm not isolating the right case) which leaves me very confused. Does anyone have any ideas how I can go about dealing with this? I should note that while the procedure is not yet fully finished all the tests do pass so it looks like there no wrong answers are produced by this problem.

Best wishes,
Alex

Денис Крыськов

unread,
Aug 8, 2014, 10:49:47 AM8/8/14
to flint...@googlegroups.com, goodwi...@googlemail.com

>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.

Sorry for interfering, I tested/benchmarked  fmpz_mat_hnf_pernet_stein() against Sage hnf_square(), on square non-singular matrix of varying dimension and entries size. Took 3 files out of commit 35dc83a09be087a8de58e4df6e99b4ef4430734f.

Same matrix were used as input. Results match --- no error.

Script here, its result here. Did I benchmark a wrong subroutine? Is there any faster HNF subroutine in https://github.com/alexjbest/flint2?

Bill Hart

unread,
Aug 8, 2014, 11:46:10 AM8/8/14
to Денис Крыськов, flint-devel
Thanks for the benchmark, but try to remember this is code still in development.

Alex Best

unread,
Aug 18, 2014, 12:34:30 PM8/18/14
to flint...@googlegroups.com, crazz...@gmail.com
Hi everyone, I just wrote a blog post doing a bit of summing up of what I've worked on as part of GSoC, the text is below but it might be better read (the tables at least) on the blog itself

First of all apologies for not keeping everyone up to date recently. I’ve just been cleaning up code and deciding what is in a good enough state to include in my final evaluation for GSoC so there wasn’t that much new of interest to report. This is all now done and my GitHub repository now contains only code that appears to be working well.

The main project goals of implementing quick methods to compute Hermite and Smith normal forms have been completed. Below is table and graph comparing the timings to those obtained with Sage (I spent about half a week being very sad about my work until I realised I needed to clear the cache to get the right timings for Sage!). The class of matrices used were random square with dimension and entry bits as in the table.

Flint timings (s):
     4              8            16             32             64            128
10 0.0001498 0.000181 0.0003962 0.0003608 0.0011182 0.0011266
20 0.0007708 0.0010738 0.0014056 0.0023238 0.0040728 0.009321
30 0.0021142 0.003127 0.0042294 0.007065 0.015792 0.0427272
40 0.0050182 0.0066494 0.0101572 0.0169392 0.038086 0.0985452
50 0.0105184 0.0138692 0.0183774 0.0319924 0.0812636 0.2245758
60 0.0181876 0.0243814 0.0342512 0.0625658 0.1578844 0.4360994
70 0.028565 0.0373348 0.0546572 0.1110402 0.290543 0.8147328
80 0.0417594 0.0595228 0.0882594 0.1842448 0.4881932 1.3889464
90 0.0694218 0.08668 0.1405782 0.2854802 0.7817248 2.2501918
100 0.0880376 0.1192832 0.2052142 0.4448414 1.245277 3.5487596


Flint to Sage timing ratios (< 1 is best for us):

     4         8        16        32       64       128
10 0.6965 1.0258 1.9950 1.5602 3.8941 2.8422
20 0.7261 0.8606 0.9396 1.1124 1.0772 0.9704
30 0.6531 0.7794 0.8381 0.8015 0.7669 0.7449
40 0.6492 0.7048 0.7380 0.5891 0.4896 0.4245
50 0.6595 0.6637 0.5511 0.3997 0.3666 0.3543
60 0.6354 0.6325 0.4950 0.3586 0.3128 0.2958
70 0.6016 0.5616 0.3836 0.3126 0.2764 0.2768
80 0.1957 0.2762 0.3801 0.6697 1.2174 1.6715
90 0.2509 0.3145 0.4730 0.8187 1.5402 2.1104
100 0.2527 0.3257 0.5340 0.9906 1.8450 2.5461


For simplicity’s sake I simply compared fmpz_mat_hnf_pernet_stein to Sage’s hermite_form, so it’s worth noting that Flint is faster still for small matrices if another method is used (which the fmpz_mat_hnf function should choose for you in practice). We can see here that although Flint does well for many matrices in this range it gets worse as the matrix and bit size gets larger, indeed this trend continues and my functions are far worse for very big matrices (Denis Kryzkov’s benchmark here gives a good indication of the scale of the problem). The run time of the asymptotically most efficient HNF method I have implemented (Pernet-Stein) depends heavily on the computation of nullspaces and so this is definitely an area that can be improved. Both approaches I’ve tried to speed up nullspace computation (multimodular and p-adic lifting) haven’t worked out being any better (asymptotically) than the pre-existing code based on the row echelon form. The remaining barrier here seems to be modular rref which I’ve looked at improving over the past week, this is certainly possible and I plan to carry on working on it (I have a complete but buggy implementation of a method described in Storjohann’s thesis and a working implementation of classical Gaussian elimination which is fast for small matrices at the moment). Finishing this will I hope bring the timings for Pernet-Stein down to something more like the ones in Sage. Sage uses IML for these computations, but even without using BLAS I think we could get a fair amount closer to IML’s performance for large matrices.

As for Smith forms, the randomised algorithm I worked on remains problematic, so I have left that out for now and stuck with the two slightly older approaches, a general one due to Kannan and Bachem and a modular one due to Iliopoulos for full rank square matrices. Below are some timings, again with comparisons to Sage. The class of matrices used is the same as that above, which means that it is highly likely that Iliopoulos’ would be appropriate (though it may not have been chosen by fmpz_mat_snf). Sage uses Pari for this, but it was easier for me to change my existing timing code than set up a direct comparison to Pari.

Flint timings (s):
    4               8              16            32             64             128
10 0.0001008 0.0001776 0.0002186 0.0005208 0.0016076 0.0042602
20 0.001181 0.0017224 0.0025602 0.0050306 0.0131318 0.038855
30 0.0039768 0.006887 0.013429 0.031941 0.0904368 0.2860226
40 0.0138918 0.0201976 0.0408486 0.1258706 0.386331 1.1849852
50 0.0312358 0.0544678 0.1120322 0.370241 1.155253 3.6941254
60 0.061067 0.116257 0.2779696 0.8377208 2.5126946 7.9345656


Flint to Sage timing ratios (< 1 is best for us):
     2          4           8           16         32         64         128
10 0.01630 0.02870 0.04939 0.05714 0.10546 0.22552 0.36044
20 0.05063 0.06777 0.08559 0.08868 0.10665 0.14573 0.19519
30 0.08086 0.07276 0.09437 0.10418 0.13330 0.17716 0.23537
40 0.08905 0.10716 0.09976 0.10617 0.16009 0.21715 0.27530
50 0.10550 0.11037 0.11270 0.11762 0.18405 0.24098 0.30266
60 0.12292 0.10537 0.11135 0.12858 0.17996 0.23113 0.29524


(I will update this table on my blog when more timings finish, I wanted to post this post and it is taking a while)

These timings look good for Flint but I’m not completely sure yet what the large scale behaviour is.

I still have a number of more experimental bits of code around that I will continue to work on getting into a more stable and usable state. Along with some other little bits that I never managed to get around to during the official GSoC period that I hope to get around to at some point.

Finally I want to say a huge thanks to everyone who commented on what I’ve been doing, and especially to Fredrik for his excellent advice over the course of the project. All the comments were very much appreciated.

Bill Hart

unread,
Aug 19, 2014, 3:32:04 PM8/19/14
to flint-devel
Hi Alex,


On 18 August 2014 18:34, Alex Best <alex....@gmail.com> wrote:
Hi everyone, I just wrote a blog post doing a bit of summing up of what I've worked on as part of GSoC, the text is below but it might be better read (the tables at least) on the blog itself

First of all apologies for not keeping everyone up to date recently. I’ve just been cleaning up code and deciding what is in a good enough state to include in my final evaluation for GSoC so there wasn’t that much new of interest to report. This is all now done and my GitHub repository now contains only code that appears to be working well.

Excellent. Thanks very much for doing the clean up for us. I'll let Fredrik merge your code, and I'll merge from him.

I'm very much looking forward to using the new HNF and SNF code, especially when I get matrices implemented in Nemo when we reach flint power level over 9000+.
 

The main project goals of implementing quick methods to compute Hermite and Smith normal forms have been completed. Below is table and graph comparing the timings to those obtained with Sage (I spent about half a week being very sad about my work until I realised I needed to clear the cache to get the right timings for Sage!).

Ah, well spotted.
That's very decent for flint!
 


For simplicity’s sake I simply compared fmpz_mat_hnf_pernet_stein to Sage’s hermite_form, so it’s worth noting that Flint is faster still for small matrices if another method is used (which the fmpz_mat_hnf function should choose for you in practice). We can see here that although Flint does well for many matrices in this range it gets worse as the matrix and bit size gets larger, indeed this trend continues and my functions are far worse for very big matrices (Denis Kryzkov’s benchmark here gives a good indication of the scale of the problem).

Sure.
 
The run time of the asymptotically most efficient HNF method I have implemented (Pernet-Stein) depends heavily on the computation of nullspaces and so this is definitely an area that can be improved.

Understood. We'll revisit the timings once the nullspace code in flint is improved. Linear algebra is a huge area, and it will be a while before flint is competitive with a number of other projects. Though we are more than competitive in some areas already.
 
Both approaches I’ve tried to speed up nullspace computation (multimodular and p-adic lifting) haven’t worked out being any better (asymptotically) than the pre-existing code based on the row echelon form.

Hmm. That's definitely surprising, especially with respect to the p-adic lifting algorithm. I'll ask around the computer algebra department where I work and see if anyone has heard anything more than gossip about this. I do vaguely recall someone mentioning that this was particularly hard!
 
The remaining barrier here seems to be modular rref which I’ve looked at improving over the past week, this is certainly possible and I plan to carry on working on it (I have a complete but buggy implementation of a method described in Storjohann’s thesis and a working implementation of classical Gaussian elimination which is fast for small matrices at the moment).

A really efficient implementation of Gaussian elimination can be quite efficient. So don't be afraid to improve it and look for a higher crossover with the other algorithms. It might also help to try and figure out what Sage itself is doing here, especially which algorithm and what crossovers.
 
Finishing this will I hope bring the timings for Pernet-Stein down to something more like the ones in Sage. Sage uses IML for these computations, but even without using BLAS I think we could get a fair amount closer to IML’s performance for large matrices.

BLAS should only explain a factor of 2 or so.
 

As for Smith forms, the randomised algorithm I worked on remains problematic, so I have left that out for now and stuck with the two slightly older approaches, a general one due to Kannan and Bachem and a modular one due to Iliopoulos for full rank square matrices. Below are some timings, again with comparisons to Sage. The class of matrices used is the same as that above, which means that it is highly likely that Iliopoulos’ would be appropriate (though it may not have been chosen by fmpz_mat_snf). Sage uses Pari for this, but it was easier for me to change my existing timing code than set up a direct comparison to Pari.

Flint timings (s):
    4               8              16            32             64             128
10 0.0001008 0.0001776 0.0002186 0.0005208 0.0016076 0.0042602
20 0.001181 0.0017224 0.0025602 0.0050306 0.0131318 0.038855
30 0.0039768 0.006887 0.013429 0.031941 0.0904368 0.2860226
40 0.0138918 0.0201976 0.0408486 0.1258706 0.386331 1.1849852
50 0.0312358 0.0544678 0.1120322 0.370241 1.155253 3.6941254
60 0.061067 0.116257 0.2779696 0.8377208 2.5126946 7.9345656


Flint to Sage timing ratios (< 1 is best for us):
     2          4           8           16         32         64         128
10 0.01630 0.02870 0.04939 0.05714 0.10546 0.22552 0.36044
20 0.05063 0.06777 0.08559 0.08868 0.10665 0.14573 0.19519
30 0.08086 0.07276 0.09437 0.10418 0.13330 0.17716 0.23537
40 0.08905 0.10716 0.09976 0.10617 0.16009 0.21715 0.27530
50 0.10550 0.11037 0.11270 0.11762 0.18405 0.24098 0.30266
60 0.12292 0.10537 0.11135 0.12858 0.17996 0.23113 0.29524


Well that is an unmitigated success! And this is a terribly important computation as well. Well done!
 

(I will update this table on my blog when more timings finish, I wanted to post this post and it is taking a while)

Nothing extra there at the moment, but I'll look back there when I think of it.
 

These timings look good for Flint but I’m not completely sure yet what the large scale behaviour is.

It's conceivably similar to what happens with HNF. We'll see I guess.
 

I still have a number of more experimental bits of code around that I will continue to work on getting into a more stable and usable state.

Please do. You're an awesome contributor and we appreciate having your contributions. I have colleagues here begging me to put HNF in flint on a regular basis, so I will make sure they know Alex Best has done it for us!
 
Along with some other little bits that I never managed to get around to during the official GSoC period that I hope to get around to at some point.

Finally I want to say a huge thanks to everyone who commented on what I’ve been doing, and especially to Fredrik for his excellent advice over the course of the project. All the comments were very much appreciated.

Thank you!

Bill.
 

On Friday, August 8, 2014 4:46:10 PM UTC+1, Bill Hart wrote:
Thanks for the benchmark, but try to remember this is code still in development.


On 8 August 2014 16:49, Денис Крыськов <crazz...@gmail.com> wrote:

>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.

Sorry for interfering, I tested/benchmarked  fmpz_mat_hnf_pernet_stein() against Sage hnf_square(), on square non-singular matrix of varying dimension and entries size. Took 3 files out of commit 35dc83a09be087a8de58e4df6e99b4ef4430734f.

Same matrix were used as input. Results match --- no error.

Script here, its result here. Did I benchmark a wrong subroutine? Is there any faster HNF subroutine in https://github.com/alexjbest/flint2?

Fredrik Johansson

unread,
Aug 20, 2014, 9:52:10 AM8/20/14
to flint-devel
On Mon, Aug 18, 2014 at 6:34 PM, Alex Best <alex....@gmail.com> wrote:
> Hi everyone, I just wrote a blog post doing a bit of summing up of what I've
> worked on as part of GSoC, the text is below but it might be better read
> (the tables at least) on the blog itself
>
> First of all apologies for not keeping everyone up to date recently. I’ve
> just been cleaning up code and deciding what is in a good enough state to
> include in my final evaluation for GSoC so there wasn’t that much new of
> interest to report. This is all now done and my GitHub repository now
> contains only code that appears to be working well.

Hi Alex,

Thank for your very hard work.

I'm looking at your current trunk, and some of the functions
(including rref_multi_mod, nullspace_dixon) appear to be missing
documentation. And most of the actual test code in t-rref_multi_mod.c
is currently commented out.

Where does the bound in rref_multi_mod come from? It seems that the
bound is just set the magnitude of the largest entry times the number
of rows. This is surely not correct! You presumably need a
Hadamard-type bound that accounts for the largest possible bit size of
both numerators and denominators in the output.

In rref_multi_mod, it appears that you reconstruct the whole matrix.
If the rref is

1 0 0 x
0 1 0 y
0 0 1 z

for example, you only really need to reconstruct the submatrix [x, y,
z], as the rest can be inferred from the pivot data. I'm not sure if
this accounts for a significant part of the slowness we're seeing for
huge matrices, but perhaps it is makes some difference.

Fredrik

Bill Hart

unread,
Aug 20, 2014, 12:15:11 PM8/20/14
to flint-devel, Alex Best
Hi Alex,

question inline below.


On 18 August 2014 18:34, Alex Best <alex....@gmail.com> wrote:
Can you tell me over what ring the nullspace computation is done?

Do you need the entire nullspace or just a vector in the nullspace?

What are the sizes of the entries in the matrix for which you want the nullspace? (In the cases where things are slow.)

My guess is this is over Z, right. But you can make it over Z/pZ with the multimodular method. And then you get to choose the size of the primes?

In the p-adic lifting approach, I assume the bottleneck is not the initial nullspace computation over Z/pZ, but the lifting step? And you get to choose the prime there too, right?
 

Alex Best

unread,
Aug 23, 2014, 11:34:53 PM8/23/14
to flint...@googlegroups.com
Hi Fredrik,
Yes apologies, those functions were not meant to be in the "clean" branch of the code!
The bound in rref_multi_mod is still something I need to work out properly, there is a TODO there to guess the height of the entries in the result. I've spent quite a while a couple of days ago trying to do this properly and had a lot of difficulty finding something that works, even when I get the height right it still seems the bound I've been using is far too small, but I can't figure out what the right one should be.
That's a really nice idea about the reconstruction, I implemented it and if I remember the old timings right it has improved it, thanks! I guess we could actually do something similar for the CRT step too.

Hi Bill,
The nullspace is taken over Z, and for Pernet-Stein you need the whole thing (but unless something has gone horribly wrong it will only be 1 dimensional, nevertheless you need to know when this happens).
The size of the entries are the same as that of the input for the HNF but the problems really occur as you get to large inputs.
Yes, you can choose the prime size, I've just been trying with primes of size NMOD_MAT_OPTIMAL_MODULUS_BITS.

Now for something a little different that renders some of what I said above a little irrelevant!
I started looking at another method for rrefs entirely after realising that the multimodular one wasn't working for me, and I'm pleased to say the new one works very well!
The algorithm is also given in Stein's book on modular forms but for some reason I overlooked it before. The idea is really quite simple, the row and column profile for the rref is found modulo a random prime, then the nonsingular submatrix of the original matrix cut out by that profile is inverted to compute the whole rref (or in reality the whole rref is solved for with dixon).
I will be able to do some proper timings next week but some initial testing suggests that using this rref brings the asymptotic runtime of the hnf_pernet_stein right down to the range I'd originally hoped for.

Bill Hart

unread,
Aug 23, 2014, 11:58:08 PM8/23/14
to flint-devel
On 24 August 2014 05:34, Alex Best <alex....@gmail.com> wrote:
Hi Fredrik,
Yes apologies, those functions were not meant to be in the "clean" branch of the code!
The bound in rref_multi_mod is still something I need to work out properly, there is a TODO there to guess the height of the entries in the result. I've spent quite a while a couple of days ago trying to do this properly and had a lot of difficulty finding something that works, even when I get the height right it still seems the bound I've been using is far too small, but I can't figure out what the right one should be.
That's a really nice idea about the reconstruction, I implemented it and if I remember the old timings right it has improved it, thanks! I guess we could actually do something similar for the CRT step too.

Hi Bill,
The nullspace is taken over Z, and for Pernet-Stein you need the whole thing (but unless something has gone horribly wrong it will only be 1 dimensional, nevertheless you need to know when this happens).
The size of the entries are the same as that of the input for the HNF but the problems really occur as you get to large inputs.
Yes, you can choose the prime size, I've just been trying with primes of size NMOD_MAT_OPTIMAL_MODULUS_BITS.

Now for something a little different that renders some of what I said above a little irrelevant!
I started looking at another method for rrefs entirely after realising that the multimodular one wasn't working for me, and I'm pleased to say the new one works very well!
The algorithm is also given in Stein's book on modular forms but for some reason I overlooked it before. The idea is really quite simple, the row and column profile for the rref is found modulo a random prime, then the nonsingular submatrix of the original matrix cut out by that profile is inverted to compute the whole rref (or in reality the whole rref is solved for with dixon).
I will be able to do some proper timings next week but some initial testing suggests that using this rref brings the asymptotic runtime of the hnf_pernet_stein right down to the range I'd originally hoped for.

Woot!

Looking forward to the updated timings next week!

Bill.
 

--

Fredrik Johansson

unread,
Aug 24, 2014, 7:05:33 AM8/24/14
to flint-devel
On Sun, Aug 24, 2014 at 5:34 AM, Alex Best <alex....@gmail.com> wrote:
> Hi Fredrik,
> Yes apologies, those functions were not meant to be in the "clean" branch of
> the code!

No problem. Just let me know when you have a branch ready for reviewing/merging.

> The bound in rref_multi_mod is still something I need to work out properly,
> there is a TODO there to guess the height of the entries in the result. I've
> spent quite a while a couple of days ago trying to do this properly and had
> a lot of difficulty finding something that works, even when I get the height
> right it still seems the bound I've been using is far too small, but I can't
> figure out what the right one should be.
> That's a really nice idea about the reconstruction, I implemented it and if
> I remember the old timings right it has improved it, thanks! I guess we
> could actually do something similar for the CRT step too.

Yes, doing it for the CRT step too should help.

> Hi Bill,
> The nullspace is taken over Z, and for Pernet-Stein you need the whole thing
> (but unless something has gone horribly wrong it will only be 1 dimensional,
> nevertheless you need to know when this happens).
> The size of the entries are the same as that of the input for the HNF but
> the problems really occur as you get to large inputs.
> Yes, you can choose the prime size, I've just been trying with primes of
> size NMOD_MAT_OPTIMAL_MODULUS_BITS.
>
> Now for something a little different that renders some of what I said above
> a little irrelevant!
> I started looking at another method for rrefs entirely after realising that
> the multimodular one wasn't working for me, and I'm pleased to say the new
> one works very well!
> The algorithm is also given in Stein's book on modular forms but for some
> reason I overlooked it before. The idea is really quite simple, the row and
> column profile for the rref is found modulo a random prime, then the
> nonsingular submatrix of the original matrix cut out by that profile is
> inverted to compute the whole rref (or in reality the whole rref is solved
> for with dixon).

That sounds great. Is it provably correct, though?

Fredrik

Alex Best

unread,
Aug 25, 2014, 10:40:23 PM8/25/14
to flint...@googlegroups.com
> That sounds great. Is it provably correct, though?
Yes, if the profile found turns out to be wrong it is detected later on and a new prime is used.

Below are timings in seconds for the updated hnf_pernet_stein (using the new rref to find the nullspace) for dimensions from 30 to 250 (steps of 20) and entry size from 2 to 512 bits (doubling for each column):

30  0.002032 0.0021454 0.0030614 0.0041586 0.0068722 0.0163326 0.0400516 0.1145904 0.37742
50  0.006511 0.008538 0.010858 0.0130762 0.0240382 0.051378 0.136154 0.387963 1.2802386
70  0.0203652 0.0183178 0.0236848 0.030885 0.0529964 0.1221752 0.3146208 0.9285352 3.0924102
90  0.0397604 0.045225 0.0464526 0.067833 0.1079738 0.2353842 0.61537 1.7704664 5.6521438
110  0.0654016 0.0665304 0.072144 0.0957408 0.1581766 0.3812806 0.9435262 2.862539 9.5535622
130  0.1002012 0.1082292 0.1127852 0.1888582 0.3101552 0.5915402 1.4550786 4.4907582 15.1206974
150  0.109988 0.1819912 0.1747046 0.2528798 0.3727922 0.84999 2.1557882 6.697591 22.165655
170  0.2268388 0.1861272 0.2166112 0.2988404 0.5177246 1.1613154 3.074675 9.3826824 31.4222814
190  0.3241242 0.3548714 0.447514 0.4326614 0.8565198 1.6538525 4.0739845 13.4306535 41.8784135
210  0.456016 0.4495685 0.4907565 0.6912685 0.9127715 2.165688 5.712902 17.343598 54.839177
230  0.4459955 0.711927 0.377327 0.7891565 1.162798 2.929525 6.7865515 20.991417 71.217659
250  0.8109445 0.7282285 0.980676 0.8665595 1.750517 3.5086055 8.773571 27.288292 88.756831

Its worth noting that in both here and the table below I switched from averaging 3 repetitions of 5 separate examples to 2 repetitions of 2 examples after dimension 190, 64 bits as it was taking a while, so the timings after this point should be taken with a slightly larger pinch of salt.
Below are the updated Flint to Sage ratios (from 2 to 512 bits again):

30  0.6834 0.61456 0.6601 0.7135 0.7201 0.6785 0.6077 0.6247 0.7476
50  0.4684 0.5343 0.5328 0.4016 0.3073 0.2302 0.2070 0.2098 0.2480
70  0.5548 0.3918 0.3588 0.2341 0.1503 0.1210 0.1076 0.1117 0.1388
90  0.1379 0.1578 0.1613 0.2214 0.2954 0.4492 0.5713 0.5418 0.4356
110  0.1596 0.1568 0.1762 0.2141 0.2979 0.4569 0.5511 0.5179 0.4471
130  0.1747 0.1809 0.1932 0.2920 0.4041 0.5131 0.5367 0.5039 0.4196
150  0.1384 0.2127 0.2092 0.2691 0.3202 0.4880 0.5438 0.5192 0.4914
170  0.2267 0.1844 0.2016 0.2613 0.3657 0.5138 0.5839 0.5665 0.5127
190  0.2571 0.2681 0.3382 0.2975 0.4802 0.5156 0.5938 0.6588 0.5198
210  0.2654 0.2674 0.2351 0.3550 0.3833 0.5657 0.6014 0.6737 0.5325
230  0.2298 0.3567 0.1902 0.3501 0.4248 0.6056 0.6249 0.6543 0.5789
250  0.3325 0.3008 0.3782 0.3159 0.5064 0.6179 0.6544 0.6392 0.5813

Alex

Bill Hart

unread,
Aug 25, 2014, 11:07:40 PM8/25/14
to flint-devel
Alex, I am impressed. Well done again!

Bill.


Fredrik Johansson

unread,
Aug 26, 2014, 6:03:07 AM8/26/14
to flint-devel
On Tue, Aug 26, 2014 at 4:40 AM, Alex Best <alex....@gmail.com> wrote:
>> That sounds great. Is it provably correct, though?
> Yes, if the profile found turns out to be wrong it is detected later on and
> a new prime is used.

Nice.
Terrific! That is exactly what I hoped to see.

Fredrik
Reply all
Reply to author
Forward
0 new messages