Thanks for sharing this Victor. It looks like there are some additional really good speedups there.
Flint has always had just reasonable matrix code, nothing extraordinary.
I put quite a bit of work into the recent minpoly and charpoly code I put into Flint (which NTL was already very good at). But I didn't do a comparison with anything except Sage and Magma yet.
I definitely want to spend some time going through your benchmarks and improving Flint, but as usual, so many other things need doing that it probably won't happen soon. Much of what we have done in Flint is just out-of-date for modern CPU's, especially with AVX. For example, I doubt we should be using strassen. Many of our tuning cutoffs are also way out of date. This is particularly bad as Flint has no tuning code.
Most of the recent work in Flint has been to add lots of small utility functions, to write better documentation and to make Flint more reliable (asserts, better test code, etc.)
Oh well, one day we will get around to speeding it up again.
Bill.