
Hi there!
I've got a really simple task of finding the rank of a matrix of integers. NumPy's matrix_rank() appeared to use floats internally (and gave a clearly wrong answers when the matrix contained both small and huge coefficients like 1 and 10**16), so I decided to try SymPy instead. Unfortunately, I found it pretty much impossible to find the rank of a dense matrix of size 30×30 (and above), since it took forever. I timed it and found out that runtime grows exponentially (which is really mysterious for me).Am I doing something wrong? Is there a better way?
Here's a trivial example of what I do.PS C:\> python -m timeit -s "import sympy" "sympy.randMatrix(10).rank()"
20 loops, best of 5: 13 msec per loop
PS C:\> python -m timeit -s "import sympy" "sympy.randMatrix(15).rank()"
5 loops, best of 5: 47.7 msec per loop
PS C:\> python -m timeit -s "import sympy" "sympy.randMatrix(20).rank()"
1 loop, best of 5: 2.26 sec per loop
In [1]: Matrix([[1, 2], [3, 4]])._rep
In [6]: M = Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [7]: M.rank()
Out[7]: 2
In [8]: M._rep
Out[8]: DomainMatrix({0: {0: 1, 1: 2, 2: 3}, 1: {0: 4, 1: 5, 2: 6}, 2: {0: 7, 1: 8, 2: 9}}, (3, 3), ZZ)
In [9]: rank = len(M._rep.to_field().rref()[-1])
In [10]: rank
Out[10]: 2
Here are some timings:
In [16]: for N in [1, 2, 5, 10, 20, 50, 100]:
...: M = randMatrix(N)
...: print('N =', N)
...: %time len(M._rep.to_field().rref()[-1])
...:
N = 1
CPU times: user 143 µs, sys: 1e+03 ns, total: 144 µs
Wall time: 151 µs
N = 2
CPU times: user 204 µs, sys: 39 µs, total: 243 µs
Wall time: 207 µs
N = 5
CPU times: user 547 µs, sys: 38 µs, total: 585 µs
Wall time: 618 µs
N = 10
CPU times: user 2.77 ms, sys: 68 µs, total: 2.83 ms
Wall time: 3.02 ms
N = 20
CPU times: user 21.8 ms, sys: 136 µs, total: 21.9 ms
Wall time: 28.7 ms
N = 50
CPU times: user 545 ms, sys: 2.48 ms, total: 547 ms
Wall time: 550 ms
N = 100
CPU times: user 7.37 s, sys: 58.9 ms, total: 7.43 s
Wall time: 7.51 s
That scales a lot better than the current Matrix rank method (I gave up at N=20 here):
In [15]: for N in [1, 2, 5, 10, 20, 50, 100]:
...: M = randMatrix(N)
...: print('N =', N)
...: %time M.rank()
...:
N = 1
CPU times: user 270 µs, sys: 0 ns, total: 270 µs
Wall time: 277 µs
N = 2
CPU times: user 1.12 ms, sys: 724 µs, total: 1.85 ms
Wall time: 1.18 ms
N = 5
CPU times: user 8.01 ms, sys: 96 µs, total: 8.1 ms
Wall time: 8.2 ms
N = 10
CPU times: user 23.3 ms, sys: 110 µs, total: 23.4 ms
Wall time: 23.5 ms
N = 20
CPU times: user 5.56 s, sys: 46.5 ms, total: 5.61 s
Wall time: 5.66 s
N = 50
^C---------------------------------------------------------------------------
KeyboardInterrupt
This can all be made more efficient and the plan is to do so in future SymPy versions. For now you can use DomainMatrix directly if these examples are representative of what you want to do. It's also faster to use flint through the python_flint library. I plan to make it possible to use flint internally to speed SymPy up but that hasn't happened yet:
In [1]: import flint
In [2]: M = randMatrix(100)
In [3]: fM = flint.fmpz_mat([[int(i) for i in row] for row in M.tolist()])
In [4]: %time fM.rank()
CPU times: user 3.96 ms, sys: 3.48 ms, total: 7.44 ms
Wall time: 18 ms
Out[4]: 100
In [5]: M = randMatrix(1000)
In [6]: fM = flint.fmpz_mat([[int(i) for i in row] for row in M.tolist()])
In [7]: %time fM.rank()
CPU times: user 632 ms, sys: 19.9 ms, total: 652 ms
Wall time: 655 ms
Out[7]: 1000
--
Oscar