--
---
You received this message because you are subscribed to the Google Groups "flint-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flint-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This week I’ve been working more on some of the older approaches to computing the HNF. First I finished off the modulo determinant method which works well for square matrices of full rank. One of the issues with this method however is that the determinant needs to be calculated first and can be quite large, in which case there may be little gained by calculating modulo it.
I’ve also implemented the algorithm described by Kannan and Bachem which takes the principal minors to Hermite form one at a time, as opposed to other methods I’ve implemented so far which work column by column. This also provides significant improvement on the classical method, and indeed is performing better than the modular approach in many cases.
For the next week I’ll be trying to optimise the algorithms implemented so far as much as possible, especially the modular approach of Domich, Kannan and Trotter as this plays a key role in the algorithm of Pernet and Stein. One thing I would like to try is combining the two approaches I worked on this week and making a modulo determinant version of the Kannan-Bachem algorithm. There is also the possibility that factoring the determinant into some relatively prime parts and working modulo each of those, before recombining at the end with the Chinese remainder theorem, of course this involves factoring the determinant which could make the whole procedure less worthwhile. I also need to make the necessary modifications to ensure that all algorithms work in the non-square case as efficiently as possible.
AlexNext week I’ll begin working on the algorithm given by Pernet and Stein which will take some time to do right and so I expect to be working on this in some way or another for at the whole week (at least!). Initially I’ll get a simplified version working (which will likely end up looking like the algorithm of Micciancio and Warinschi) and then I’ll start working on the improvements suggested in Pernet and Stein’s paper. Hopefully when this is done it will be extremely fast compared to the algorithms I have worked on so far, however it’s run time will depend on what I can squeeze out of the modular approach I worked on a couple of weeks ago.
Nonetheless, the last time I benchmarked, FLINT computed determinants
over Z faster than Sage for matrices up to at least size 1000. If the
bottleneck in the Pernet-Stein algorithm is the double determinant
calculation, then we can expect similar performance for HNF.
Fredrik
Hello,
I've been working on Pernet and Stein's approach some more this week (blog post, text below).I have one question that I'm hoping someone could shed some light on: I have a function that does some computation using a temporary fmpz and then sets the output fmpz using the temporary value, but when I try and remove the temporary variable from the function working out my examples suddenly gets vastly slower! The relevant lines read:fmpz_mul_ui(tmp, m1, s);fmpz_add(out, tmp, r1normal);which I replaced with (what in my mind the pretty much equivalent) linesfmpz_mul_ui(out, m1, s);fmpz_add(out, out, r1normal);my best guess is that this is some sort of cache effect (or I'm actually doing a different computation?), but I really don't know.
In case anyone is wondering this is for a slightly modified fmpz_CRT_ui_precomp function which allows half the product of the two moduli to be precomputed also and I've found speeds up functions like fmpz_mat_CRT_ui a fair amount.This week I've been working on improving the algorithm of Pernet-Stein as much as possible. After introducing as many of the ideas given in the original paper as I could I found the main bottleneck to actually be the computation of the nullspace of a certain submatrix of the matrix given, this is needed in order to efficiently solve a linear system which (likely) has a nasty final row. If we know a basis for a nullspace of the first n-1 rows of the system we can replace the final row with a random nice (having small entries) row and then find a solution to the original system by adding on a suitable multiple of the nullspace basis vector (the nullspace should be 1 dimensional for random input).
Flint uses the reduced row echelon form of a matrix to compute nullspaces (the nullspace of a matrix in this form can be easily read off and the transformation does not alter it) and so a natural place to improve nullspace computation is to improve the row echelon form computation. We can use a multimodular approach for this problem (this is described in Stein's book on computing with modular forms) and I've achieved some nice speed-ups with this method in the past couple of days. For example the multimodular method is around 200x faster for 125x125 matrices with 32 bit entries. While this has made Hermite form computations a lot faster (nullspace computation is over 90% of the work for large matrices) I still want to try and see if this can be improved upon further, after all, in this case we don't need the whole row echelon form just a vector in the kernel and the knowledge that the nullspace is 1-dimensional.
So I plan to work on this further in the next couple of days, and depending on how I feel about this approach I will either spend the rest of this week making improvements to Pernet-Stein or possibly work on implementing the algorithm of Pauderis and Storjohann.Alex
On Wednesday, July 2, 2014 9:32:57 PM UTC+1, Alex Best wrote:Hi everyone,
I've been working on the Pernet-Stein algorithm recently and wrote a blog post explaining a little about it. Text below as usual.
This week I’ve been working on implementing Pernet and Stein’s algorithm to compute the HNF. The basic idea of this method (which is in turn due to Micciancio and Warinschi) is to use the given matrix to construct another axillary matrix with small determinant (so the HNF of the new matrix can be computed without coefficient explosion) from which the original HNF can be reconstructed. The algorithm splits in to a few separate stages, the computation of two determinants, the modular HNF of the smaller matrix, the addition of a column to the HNF and then the addition of two rows. Each of these steps can be optimised quite a bit using some clever tricks, which I’m currently working on doing. The adding of rows and columns to the HNF now seems to work well, and exploiting the similarities of the submatrices whose determinants are needed to compute them both faster should be done tomorrow. At that point it should be interesting to take stock and see how well the Pernet-Stein approach is working compared to previous methods I’ve worked on. I’ll likely spend a couple more days after that improving this approach even more.
This is definitely the most complex algorithm I’ve worked on so far, but when completed it should really blow the others away for large enough random matrices.
--
This week I’ve been working on improving the computation of the nullspace of an integer by implementing a p-adic lifting based method. The algorithm is also described in Stein’s book on computing with modular forms and is closely related to Dixon’s p-adic method for linear system solving. This is pretty much finished now (modulo some weird memory leaks and determining the best values for parameters such as p-adic precision and range of primes used) and does appear to be asymptotically faster than the multimodular algorithm I worked on before, though experiments suggest that currently the matrix needs to be fairly large before the p-adic algorithm becomes better.
I was originally thinking of implementing Pauderis and Storjohann’s algorithm if I had time this week, but have spent much of the past week looking at efficient nullspace computation (and being plagued a little by unexpected power cuts!) so haven’t got much further than a skeleton implementation. Maybe I can flesh this out at some point but for the next week I’m going to move on to algorithms for the Smith normal form, another important normal form for matrices over the integers, where both column and row operations are allowed.
Hopefully building on what I’ve done so far I should be able to get some fast Smith normal form algorithms implemented shortly, indeed one simple approach is to repeatedly transpose and compute Hermite normal forms until the resulting matrix is diagonal, this won’t necessarily be the Smith form but it can then be computed with relative ease. Something more sophisticated than this will undoubtedly be best, but many SNF algorithms involve computing the HNF at some point and working from there so the current HNF algorithms provide a good basis for those computations.
Hello everyone,
I just noticed I forgot to post an update here last week, sorry about that, the link to last week's post is here.
As for this week the link is here, and the text is below:
This week I worked more on the Smith normal form algorithm I talked about last week. I implemented Iliopoulos’ algorithm for computation of the Smith form modulo an arbitrary integer, this procedure is used in a couple of places as part of Saunders and Wan’s “engineered” algorithm. Firstly we use a prime power modulus to find the Smith normal form locally for small primes (i.e. those less than 100), the modular approach is also used for the rough part (concerning all the other primes) when the largest non-zero invariant factor is small compared to the size of the matrix. This algorithm is now working correctly, though the question of how to test it properly given its Monte Carlo nature is one that will require some thought. Currently whenever I have encountered a matrix for which the output of Saunders and Wan doesn’t match the deterministic algorithms’ outputs it has turned out to be a code bug rather than a side effect of the algorithm’s randomness. I suppose allowing for a reasonable number of failures beyond the expected number in test code would be one approach, but of course there will still be occasions when the number of failures exceeds this and allowing a large number of extra failures could allow real bugs to creep in.
--
>By the way, was there a Week 8 update?Yes there was, it should be a few posts above this, here is a link though.
I did a few things (see the week 8 post), Fredrik thinks (and I agree) that the p-adic nullspace can definitely be sped up somehow, but I haven't found out exactly how yet.
>Did you find any additional speedups to the nullspace computation or to Pernet-Stein in general, that you mentioned.
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.
>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
Later this week I plan to do a lot of profiling etc. to write general hnf/snf functions that choose implementation based on input size so I'll compare against Sage (&LinBox, Pari?) then and report back.
>What is the expected failure rate of the algorithm by the way? Many of those sorts of algorithms are not expected to fail on uniformly random data in any universe in anyone's lifetime. E.g. a failure rate of 1 in 10^80 is not out of the question.The failure rate is, unfortunately, not galactically small, but is instead controllable, it just takes longer to be more sure of the answer. So it's probably a matter of letting the user decide, with some sensible default.
Hi everyone,
My latest blog post. Test below, but first a subtle bug I thought it might be worth quickly reporting here in case it turned out to be causing anyone else problems. I noticed that fmpz_xgcd(g, a, b, 0, -1) gives g = -1 as the result, a fix is now on my Github (someone may want to ensure this is the correct fix! I also reindented that file as per the code conventions while I was at it. Oh and the commit after corrects the number of tests as I left the wrong number in the file).
This week I’ve worked on a variety of different things, unfortunately getting hung up on most of them!
First I found some bugs in my Saunders-Wan Smith normal form code, most of them ended up being fixable, however one still eludes me. It seems the bug is fairly rare (occurs for roughly 1 in 10000 test cases) and it is certainly related to the random nature of the algorithm but my current thinking is that the current behaviour should not be happening even when we get unlucky with the randomness. I’ve left this issue alone for now after spending a couple of days making no progress on it.
I then moved on to writing some general HNF/SNF methods that pick between available implementations based on matrix size, norm and anything else that I could work out as being relevant. While doing this I found that the Kannan-Bachem Hermite form method worked better than the modular method some of the time and so I decided to try and combine them into a modular method that works by taking minors into HNF rather than rows. I had some problems making this work correctly when the input was not of full rank and all the fixes I tried ended up having some corner case that made them fail. It just occurred to me however that when the modular method is used as part of the Pernet-Stein algorithm the matrix given is always square and so this was not perhaps a completely wasted effort.
I then moved back to working on Pernet-Stein, specifically I added capacity for non-full row rank matrices, and fully general matrices should be done very soon.
This week I’m going to be doing some profiling and comparisons between my implementations and existing ones and try and work out the reasons why certain results are as they are and how the HNF/SNF methods I now have can be made faster in the future. It should be helpful to have a note of the barriers to having faster HNF/SNF computation in Flint and how they could be overcome.
Of course I’ll also be tidying up and documenting several bits as I go along to fill in any gaps in functionality I have left along the way.
Alex
On Sunday, June 8, 2014 11:04:27 PM UTC+1, Alex Best wrote:Hi all,I've just published a blog post explaining a little about what I've been up to this week working on Hermite normal forms.
There is one particular issue I've been having that I was wondering if anyone had any advice for dealing with, when I run valgrind on the test program for one of the HNF implementations I've been working on (fmpz_mat_hnf_mod_D) I get a lot of errors, specifically I get 33 errors regarding lost blocks, starting with:
==30989== 88 (16 direct, 72 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 46
==30989== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30989== by 0x4E76C48: flint_malloc (memory_manager.c:54)
==30989== by 0x4E8A299: _fmpz_new_mpz (fmpz.c:49)
==30989== by 0x4E8A484: _fmpz_promote (fmpz.c:95)
==30989== by 0x4E8FD59: fmpz_set (set.c:44)
==30989== by 0x4E99D0A: _fmpz_vec_set (set.c:36)
==30989== by 0x4EF311D: fmpz_mat_set (set.c:38)
==30989== by 0x4EE4907: fmpz_mat_det_bareiss (det_bareiss.c:54)
==30989== by 0x400B5C: main (t-hnf_mod_D.c:121)
I'm having a lot of trouble working out where this is coming from, line 121 is simply computing the determinant of a matrix, which is then doubled, absolute valued and then used to find the Hermite normal form of the matrix (by working modulo that number).When I've tried to isolate a specific case where this happens (the test code does 1000 iterations) I get no errors! (or I'm not isolating the right case) which leaves me very confused. Does anyone have any ideas how I can go about dealing with this? I should note that while the procedure is not yet fully finished all the tests do pass so it looks like there no wrong answers are produced by this problem.
Best wishes,
Alex
>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.
Hi everyone, I just wrote a blog post doing a bit of summing up of what I've worked on as part of GSoC, the text is below but it might be better read (the tables at least) on the blog itself
First of all apologies for not keeping everyone up to date recently. I’ve just been cleaning up code and deciding what is in a good enough state to include in my final evaluation for GSoC so there wasn’t that much new of interest to report. This is all now done and my GitHub repository now contains only code that appears to be working well.
The main project goals of implementing quick methods to compute Hermite and Smith normal forms have been completed. Below is table and graph comparing the timings to those obtained with Sage (I spent about half a week being very sad about my work until I realised I needed to clear the cache to get the right timings for Sage!).
For simplicity’s sake I simply compared fmpz_mat_hnf_pernet_stein to Sage’s hermite_form, so it’s worth noting that Flint is faster still for small matrices if another method is used (which the fmpz_mat_hnf function should choose for you in practice). We can see here that although Flint does well for many matrices in this range it gets worse as the matrix and bit size gets larger, indeed this trend continues and my functions are far worse for very big matrices (Denis Kryzkov’s benchmark here gives a good indication of the scale of the problem).
The run time of the asymptotically most efficient HNF method I have implemented (Pernet-Stein) depends heavily on the computation of nullspaces and so this is definitely an area that can be improved.
Both approaches I’ve tried to speed up nullspace computation (multimodular and p-adic lifting) haven’t worked out being any better (asymptotically) than the pre-existing code based on the row echelon form.
The remaining barrier here seems to be modular rref which I’ve looked at improving over the past week, this is certainly possible and I plan to carry on working on it (I have a complete but buggy implementation of a method described in Storjohann’s thesis and a working implementation of classical Gaussian elimination which is fast for small matrices at the moment).
Finishing this will I hope bring the timings for Pernet-Stein down to something more like the ones in Sage. Sage uses IML for these computations, but even without using BLAS I think we could get a fair amount closer to IML’s performance for large matrices.
As for Smith forms, the randomised algorithm I worked on remains problematic, so I have left that out for now and stuck with the two slightly older approaches, a general one due to Kannan and Bachem and a modular one due to Iliopoulos for full rank square matrices. Below are some timings, again with comparisons to Sage. The class of matrices used is the same as that above, which means that it is highly likely that Iliopoulos’ would be appropriate (though it may not have been chosen by fmpz_mat_snf). Sage uses Pari for this, but it was easier for me to change my existing timing code than set up a direct comparison to Pari.
Flint timings (s):
4 8 16 32 64 128
10 0.0001008 0.0001776 0.0002186 0.0005208 0.0016076 0.0042602
20 0.001181 0.0017224 0.0025602 0.0050306 0.0131318 0.038855
30 0.0039768 0.006887 0.013429 0.031941 0.0904368 0.2860226
40 0.0138918 0.0201976 0.0408486 0.1258706 0.386331 1.1849852
50 0.0312358 0.0544678 0.1120322 0.370241 1.155253 3.6941254
60 0.061067 0.116257 0.2779696 0.8377208 2.5126946 7.9345656
Flint to Sage timing ratios (< 1 is best for us):
2 4 8 16 32 64 128
10 0.01630 0.02870 0.04939 0.05714 0.10546 0.22552 0.36044
20 0.05063 0.06777 0.08559 0.08868 0.10665 0.14573 0.19519
30 0.08086 0.07276 0.09437 0.10418 0.13330 0.17716 0.23537
40 0.08905 0.10716 0.09976 0.10617 0.16009 0.21715 0.27530
50 0.10550 0.11037 0.11270 0.11762 0.18405 0.24098 0.30266
60 0.12292 0.10537 0.11135 0.12858 0.17996 0.23113 0.29524
(I will update this table on my blog when more timings finish, I wanted to post this post and it is taking a while)
These timings look good for Flint but I’m not completely sure yet what the large scale behaviour is.
I still have a number of more experimental bits of code around that I will continue to work on getting into a more stable and usable state.
Along with some other little bits that I never managed to get around to during the official GSoC period that I hope to get around to at some point.
Finally I want to say a huge thanks to everyone who commented on what I’ve been doing, and especially to Fredrik for his excellent advice over the course of the project. All the comments were very much appreciated.
On Friday, August 8, 2014 4:46:10 PM UTC+1, Bill Hart wrote:Thanks for the benchmark, but try to remember this is code still in development.On 8 August 2014 16:49, Денис Крыськов <crazz...@gmail.com> wrote:
>Do you have an idea how it compares now, say to the implementation in Sage? (I'm not expecting to beat that because our arithmetic is quite slow, but am just generally interested how far we are off in some cases now.)
I haven't checked recently (as I don't have an automatic comparison method, just saving a few examples to a file and then trying it out), last time I did check I think mine took twice as long as Sage does for large random matrices, that was a few weeks ago though so it should be a bit more competitive now.
Sorry for interfering, I tested/benchmarked fmpz_mat_hnf_pernet_stein() against Sage hnf_square(), on square non-singular matrix of varying dimension and entries size. Took 3 files out of commit 35dc83a09be087a8de58e4df6e99b4ef4430734f.Same matrix were used as input. Results match --- no error.Script here, its result here. Did I benchmark a wrong subroutine? Is there any faster HNF subroutine in https://github.com/alexjbest/flint2?
Hi Fredrik,
Yes apologies, those functions were not meant to be in the "clean" branch of the code!
The bound in rref_multi_mod is still something I need to work out properly, there is a TODO there to guess the height of the entries in the result. I've spent quite a while a couple of days ago trying to do this properly and had a lot of difficulty finding something that works, even when I get the height right it still seems the bound I've been using is far too small, but I can't figure out what the right one should be.
That's a really nice idea about the reconstruction, I implemented it and if I remember the old timings right it has improved it, thanks! I guess we could actually do something similar for the CRT step too.
Hi Bill,
The nullspace is taken over Z, and for Pernet-Stein you need the whole thing (but unless something has gone horribly wrong it will only be 1 dimensional, nevertheless you need to know when this happens).
The size of the entries are the same as that of the input for the HNF but the problems really occur as you get to large inputs.
Yes, you can choose the prime size, I've just been trying with primes of size NMOD_MAT_OPTIMAL_MODULUS_BITS.
Now for something a little different that renders some of what I said above a little irrelevant!
I started looking at another method for rrefs entirely after realising that the multimodular one wasn't working for me, and I'm pleased to say the new one works very well!
The algorithm is also given in Stein's book on modular forms but for some reason I overlooked it before. The idea is really quite simple, the row and column profile for the rref is found modulo a random prime, then the nonsingular submatrix of the original matrix cut out by that profile is inverted to compute the whole rref (or in reality the whole rref is solved for with dixon).
I will be able to do some proper timings next week but some initial testing suggests that using this rref brings the asymptotic runtime of the hnf_pernet_stein right down to the range I'd originally hoped for.
--