Gsoc Idea Discussion

316 views
Skip to first unread message

Kunal Singh

unread,
Feb 13, 2021, 8:27:04 AM2/13/21
to sy...@googlegroups.com
Hello Developers ,
I have read the ideas page and found the idea for probability and statistics in the Mathematics Project Section and would like to work on it if it is being considered for Gsoc this year , A brief description and a Starting point would really be helpful .

--


Image result for iit kgp logo

Kunal Singh

Second Year Undergraduate Student

Computer Science and Engineering

Indian Institute of Technology,

Kharagpur 

Github



Kunal Singh

unread,
Feb 15, 2021, 7:37:11 AM2/15/21
to sy...@googlegroups.com
Hi developers,
Can I get a follow up on this please

Gagandeep Singh (B17CS021)

unread,
Feb 15, 2021, 8:05:43 AM2/15/21
to sy...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKq4Gz_5TYCUt8JiZnCEf2qmbuUKX6%2BLUGu%2BSaL32uEJ7pQ2Jg%40mail.gmail.com.


--
With regards,
Gagandeep Singh

Kartik Sethi

unread,
Feb 20, 2021, 2:40:40 AM2/20/21
to sympy
Hey, my name is Kartik Sethi. I am interested in taking part in Gsoc this year. I have been contributing to sympy for the past couple of months.

I have come up with a few ideas for Gsoc. I am focusing primarily on the sympy matrices module.

  1. Implementing complete Singular Value decomposition. Sympy currently has a function for condensed SVD ( I am a contributor on this #20761). 
  2. Using the complete SVD, I can implement Polar decomposition
  3. Hermite Normal Form. There is an old un-merged PR (#18534), I could work on this and complete it.
  4. Sparse-Fraction Free Algorithm another old unfinished PR (#9133)
  5. Jordan Canonical Form
  6. Improve time complexity for QR decomposition of upper Hessenberg. There is an algorithm using Householder reflectors that can cut down time-complexity to O(n^2) instead of current O(n^3)
I am open to adding to this list with the help of possible mentors. I have taken a course in Matrix Computations in my previous semester, so I think I can really help improve and add to the matrices module.

Thank You

Oscar Benjamin

unread,
Feb 20, 2021, 6:34:13 AM2/20/21
to sympy
On Sat, 20 Feb 2021 at 07:40, Kartik Sethi <kartik...@gmail.com> wrote:
>
> Hey, my name is Kartik Sethi. I am interested in taking part in Gsoc this year. I have been contributing to sympy for the past couple of months.

Hi Kartik,

> I have come up with a few ideas for Gsoc. I am focusing primarily on the sympy matrices module.
>
> Implementing complete Singular Value decomposition. Sympy currently has a function for condensed SVD ( I am a contributor on this #20761).
> Using the complete SVD, I can implement Polar decomposition
> Hermite Normal Form. There is an old un-merged PR (#18534), I could work on this and complete it.
> Sparse-Fraction Free Algorithm another old unfinished PR (#9133)

The Matrix class does not necessarily get much benefit from
fraction-free algorithms because of the internal representation.
DomainMatrix does though (see below).

> Jordan Canonical Form

This is already implemented as the Matrix.jordan_form method so I'm
not sure what you are proposing. The main way it could be improved is
by having the method use DomainMatrix internally just like eigenvects
does (see below).

> Improve time complexity for QR decomposition of upper Hessenberg. There is an algorithm using Householder reflectors that can cut down time-complexity to O(n^2) instead of current O(n^3)

When I've timed calculations with Matrix they almost never match up to
the big-O behaviour that is typically expected for any given
algorithm. I think there are too many other symbolic computations
going on that it slows down more than it should as n grows. Operations
with DomainMatrix though do tend to have the big-O that you would
expect if you take into account bit complexity.

This is not listed anywhere on the ideas page or really documented
anywhere yet but there is a new mostly internal implementation of
matrices in the sympy.polys.matrices module. This new implementation
is called DomainMatrix and is based on the polys domains. The domains
themselves are explained in these recently added docs:
https://docs.sympy.org/dev/modules/polys/domainsintro.html
There are aren't any docs for DomainMatrix yet and it is not that easy
to use but you can find more information in this most recent PR which
links to other PRs:
https://github.com/sympy/sympy/pull/20780

The easiest way to test out DomainMatrix is by converting from a
normal Matrix like this:

In [75]: from sympy.polys.matrices import DomainMatrix

In [76]: M = Matrix(range(25)).reshape(5, 5)

In [77]: M
Out[77]:
⎡0 1 2 3 4 ⎤
⎢ ⎥
⎢5 6 7 8 9 ⎥
⎢ ⎥
⎢10 11 12 13 14⎥
⎢ ⎥
⎢15 16 17 18 19⎥
⎢ ⎥
⎣20 21 22 23 24⎦

In [78]: dM = DomainMatrix.from_Matrix(M)

In [80]: dM
Out[80]: DomainMatrix([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12,
13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]], (5, 5), ZZ)

DomainMatrix is much faster than Matrix for many calculations. The
difference is observable for a simple matrix of integers like this:

In [81]: %time M.det()
CPU times: user 2.4 ms, sys: 281 µs, total: 2.68 ms
Wall time: 2.73 ms
Out[81]: 0

In [82]: %time dM.det()
CPU times: user 143 µs, sys: 161 µs, total: 304 µs
Wall time: 310 µs
Out[82]: mpz(0)

The difference becomes really noticeable if the matrix has algebraic
elements like I or sqrt(2) or has symbols like x or y in it:

In [88]: M = randMatrix(5, 5) + randMatrix(5, 5) * I

In [89]: M
Out[89]:
⎡26 + 27⋅ⅈ 41 + 76⋅ⅈ 98 + 76⋅ⅈ 64 + 52⋅ⅈ 48 + 2⋅ⅈ ⎤
⎢ ⎥
⎢65 + 81⋅ⅈ 49 + 25⋅ⅈ 39 + 82⋅ⅈ 74⋅ⅈ 77 + 3⋅ⅈ ⎥
⎢ ⎥
⎢11 + 59⋅ⅈ 49 + 67⋅ⅈ 50 + 25⋅ⅈ 69 + 24⋅ⅈ 79 + 31⋅ⅈ⎥
⎢ ⎥
⎢42 + 29⋅ⅈ 33 + 72⋅ⅈ 37 + 59⋅ⅈ 10 + 12⋅ⅈ 27 + 62⋅ⅈ⎥
⎢ ⎥
⎣97 + 7⋅ⅈ 4 + 26⋅ⅈ 8 + 30⋅ⅈ 47 + 84⋅ⅈ 81 + 6⋅ⅈ ⎦

In [90]: dM = DomainMatrix.from_Matrix(M)

In [91]: %time ok = M.det()
CPU times: user 486 ms, sys: 4.15 ms, total: 490 ms
Wall time: 492 ms

In [92]: %time ok = dM.det()
CPU times: user 12 ms, sys: 422 µs, total: 12.4 ms
Wall time: 12.7 ms

In [93]: %time ok = M.charpoly()
CPU times: user 18.9 s, sys: 186 ms, total: 19.1 s
Wall time: 19.2 s

In [94]: %time ok = dM.charpoly()
CPU times: user 10 ms, sys: 170 µs, total: 10.2 ms
Wall time: 10.5 ms

That's a 50x speed difference for computing the determinant and a
2000x difference for the characteristic polynomial. This is just for a
5x5 matrix: the speed difference will be more significant for larger
matrices. Further optimisations are still possible for both det and
charpoly: sparse fraction free elimination could be used for det for
example.

It is not currently clear how to make use of DomainMatrix for users.
Currently it is only used as part of linsolve and internally by the
Matrix.eigenvects routine. Right now the thing that would make the
biggest difference to matrices in sympy would be making more use of
DomainMatrix to speed up calculations.

For example this PR added the use of DomainMatrix for computing eigenvectors:
https://github.com/sympy/sympy/pull/20614

That makes eigenvector calculations extremely fast on master compared
to sympy 1.7.1. Given this matrix:

B = Matrix([
[ 5, -5, -3, 2],
[-2, -5, 0, 2],
[-2, -7, -5, -2],
[ 7, 10, 3, 9]])

On master the eigenvectors are computed using DomainMatrix and it
takes 200 milliseconds:

In [2]: %time v = B.eigenvects()
CPU times: user 206 ms, sys: 14 ms, total: 220 ms
Wall time: 219 ms

With sympy 1.7.1 DomainMatrix is not used and the calculation is
extremely slow. I waited 10 minutes and then gave up:

In [2]: %time v = B.eigenvects()
^C---------------------------------------------------------------------------
KeyboardInterrupt

Methods like fraction-free algorithms don't necessarily make much
sense with the current Matrix class as it uses Expr for the matrix
elements which does not necessarily get much speed up from avoiding
division. With DomainMatrix it can make a huge difference because it
means that we can perform calculations over a polynomial ring domain
rather than a rational function field which eliminates gcd
calculations.

I think that right now this is the direction to go to improve matrix
calculations. Some examples of things that need doing:

1. Add docs explaining DomainMatrix
2. Use DomainMatrix for jordan_form as well as for eigenvects and in
particular for the matrix exponential.
3. Use DomainMatrix for e.g. charpoly and many other methods.
4. Add basic matrix convenience methods like slicing to DomainMatrix.
5. Add fraction free and other algorithms to DomainMatrix
6. Figure out how and when to use both the dense and sparse
implementations of DomainMatrix.

The big thing that needs doing is providing users with a more direct
way to use DomainMatrix: a public matrix class that uses DomainMatrix
as its internal representation so that it does not need to convert
between representations each time it wants to compute something like
eigenvects. This needs a bigger discussion though about what that
public interface would be. For now I think it would be better to keep
building up DomainMatrix with additional docs, algorithms and methods
while making use of it internally to speed up particularly slow
calculations.


Oscar

Kartik Sethi

unread,
Feb 20, 2021, 9:28:35 AM2/20/21
to sympy
Thanks Oscar Benjamin for the wonderful suggestions. I will start looking into algorithms and methods that have not been implemented for the domain Matrix class yet.
I would be happy to help in building Domain Matrix class.

Do you think improving and implementing algorithms in Domain Matrix class would be a suitable Gsoc Proposal?

Oscar Benjamin

unread,
Feb 20, 2021, 9:41:49 AM2/20/21
to sympy
On Sat, 20 Feb 2021 at 14:28, Kartik Sethi <kartik...@gmail.com> wrote:
>
> Thanks Oscar Benjamin for the wonderful suggestions. I will start looking into algorithms and methods that have not been implemented for the domain Matrix class yet.
> I would be happy to help in building Domain Matrix class.
>
> Do you think improving and implementing algorithms in Domain Matrix class would be a suitable Gsoc Proposal?

Yes, absolutely. If anyone is interested in improving sympy's matrix
calculations I think that this is the most important thing to do right
now. Many algorithms are already implemented for Matrix but are
basically unusable because they are too slow except in the simplest
cases. If we can make core matrix operations faster then that will
benefit all of them.

In the cases where DomainMatrix is slow the main thing that is slow is
gcd computations. That can be improved by making more use of fraction
free algorithms as in this PR that anyone is welcome to finish:
https://github.com/sympy/sympy/pull/20483

The other side of it is improving the gcd algorithms themselves but I
think that should be a whole separate project.


--
Oscar

Oscar Benjamin

unread,
Feb 20, 2021, 5:30:01 PM2/20/21
to sympy
On Sat, 20 Feb 2021 at 11:33, Oscar Benjamin <oscar.j....@gmail.com> wrote:
> This is not listed anywhere on the ideas page or really documented
> anywhere yet but there is a new mostly internal implementation of
> matrices in the sympy.polys.matrices module. This new implementation
> is called DomainMatrix and is based on the polys domains. The domains
> themselves are explained in these recently added docs:
> https://docs.sympy.org/dev/modules/polys/domainsintro.html
> There are aren't any docs for DomainMatrix yet and it is not that easy
> to use but you can find more information in this most recent PR which
> links to other PRs:
> https://github.com/sympy/sympy/pull/20780

I've opened an issue to list things that need doing with DomainMatrix:
https://github.com/sympy/sympy/issues/20987
Message has been deleted

Kartik Sethi

unread,
Feb 21, 2021, 2:21:46 AM2/21/21
to sympy
Thanks for listing out the issues. I'll start looking into them

Kartik Sethi

unread,
Feb 28, 2021, 5:38:31 AM2/28/21
to sympy
Oscar Benjamin, I was wondering if completing the tasks listed on https://github.com/sympy/sympy/issues/20987 could be a Gsoc proposal.
There's a lot to do in that list and I can even add some other functions like QR decomposition, Hessenberg Decomposition, SVD, Polar Decomposition etc. to that list.
Since DomainMatrix class seems a lot faster, users would actually be able to use these functions for matrices.
What do you think?

Oscar Benjamin

unread,
Feb 28, 2021, 9:26:43 AM2/28/21
to sympy
On Sun, 28 Feb 2021 at 10:38, Kartik Sethi <kartik...@gmail.com> wrote:
>
> I was wondering if completing the tasks listed on https://github.com/sympy/sympy/issues/20987 could be a Gsoc proposal.

Some of the things in that list are fairly trivial like "make Matrix
use DomainMatrix for charpoly". The DomainMatrix charpoly method is
there already and is efficient so it's just a case of changing the
charpoly method of matrix to call that.

Other things on the list are a bit more complex like "implement the
Bareiss algorithm": that's not a complicated algorithm but
implementing it efficiently with pivoting for sparse matrices takes a
bit of work.

Then other things are much more difficult like "Make risch use
DomainMatrix" and are clearly beyond the scope of a GSOC proposal
right now (unless you already have significant knowledge of the
algorithm and its implementation in sympy).

So a GSOC proposal could focus on only some of the things from that
list. The list itself is very incomplete so there are plenty of other
things that would make both DomainMatrix faster and other ways to make
Matrix faster by using DomainMatrix. If you can think of anything then
please suggest on the issue so that we can add it to the list.

The most useful thing that a GSOC project could do is really just to
make the whole DomainMatrix class a bit more complete, usable and
documented so that it's easier for users and future contributors to
make use of it and also make improvements to it. There are always
contributors wanting to add basic matrix algorithms like SVD etc
regardless of GSOC. I would personally rank a GSOC project more highly
if it looks like a more involved piece of work that needs someone to
spend a bit of time getting to know a particular part of the codebase.

> There's a lot to do in that list and I can even add some other functions like QR decomposition, Hessenberg Decomposition, SVD, Polar Decomposition etc. to that list.

Since DomainMatrix works over a domain or a field some algorithms make
less sense for it. For example an algorithm that involves computing
lots of square roots means taking repeated field extensions which
would probably not be efficient with DomainMatrix.

I'm not sure how easy it is to implement any of those with
DomainMatrix but certainly if the algorithm boils down to operations
like charpoly, and nullspace (like computing the eigenvectors or
Jordan form does) then we can just make those core operations more
efficient using DomainMatrix.

I would begin with the trivial things like for example charpoly takes
11 seconds to compute the characteristic polynomial of this small 5x5
matrix of complex numbers:

In [1]: M = randMatrix(5) + randMatrix(5)*I

In [2]: %time p1 = M.charpoly()
CPU times: use 11.4 s, sys: 67.4 ms, total: 11.5 s
Wall time: 11.6 s

It only takes a few milliseconds to do the same with DomainMatrix:

In [4]: from sympy.polys.matrices import DomainMatrix

In [5]: %time dM = DomainMatrix.from_Matrix(M)
CPU times: user 552 µs, sys: 1e+03 ns, total: 553 µs
Wall time: 558 µs

In [6]: %time p2 = dM.charpoly()
CPU times: user 2.77 ms, sys: 26 µs, total: 2.79 ms
Wall time: 2.83 ms

In [7]: Poly(p2, p1.gens, domain=dM.domain) == p1
Out[7]: True

Making Matrix.charpoly faster then is a pretty trivial pull request at
this stage. That would then speed up all the other operations that use
charpoly (eigenvals, Jordan form etc).

If you time various operations like this with Matrix then you can see
what is slow and whether or not DomainMatrix can be used to speed
things up. Ideally though we just use it for a small number of
operations that are the core to other algorithms. The thing to pay
attention to is what kinds of entries you have in the matrix and what
domain the resulting DomainMatrix uses. DomainMatrix is faster for a
matrix of integers but where it really makes a difference is something
like the above which is a matrix of Gaussian integers or matrices that
involve algebraic quantities like sqrt(2) or matrices with symbols in
so that the entries are e.g. polynomials in x and y.

--
Oscar

Oscar Benjamin

unread,
Feb 28, 2021, 9:38:15 AM2/28/21
to sympy
On Sun, 28 Feb 2021 at 14:26, Oscar Benjamin <oscar.j....@gmail.com> wrote:
>
> The most useful thing that a GSOC project could do is really just to
> make the whole DomainMatrix class a bit more complete, usable and
> documented so that it's easier for users and future contributors to
> make use of it and also make improvements to it. There are always
> contributors wanting to add basic matrix algorithms like SVD etc
> regardless of GSOC. I would personally rank a GSOC project more highly
> if it looks like a more involved piece of work that needs someone to
> spend a bit of time getting to know a particular part of the codebase.

I should add here that GSOC projects often do not end up doing
precisely what is specified in the proposal. Think of the proposal as
more like demonstrating that there are things that can be done and you
understand how to do them and have a rough idea of what you can
achieve in the time. The actual list of "I will do this, then that,
..." is less important than just demonstrating that you know what
should be done and how to do it.

--
Oscar

Kartik Sethi

unread,
Mar 1, 2021, 1:46:35 AM3/1/21
to sympy
Thanks for the suggestion. I'll start looking into methods that can be made faster and also look into making DomainMatrix more complete.

Kartik Sethi

unread,
Mar 8, 2021, 11:00:05 AM3/8/21
to sympy
Oscar Benjamin, what if someone else also puts up this same proposal to improve the DomainMatrix module for Gsoc. How does sympy decide which person will work on it.

Oscar Benjamin

unread,
Mar 8, 2021, 8:19:29 PM3/8/21
to sympy
On Mon, 8 Mar 2021 at 16:00, Kartik Sethi <kartik...@gmail.com> wrote:
>
> Oscar Benjamin, what if someone else also puts up this same proposal to improve the DomainMatrix module for Gsoc. How does sympy decide which person will work on it.

There is plenty more work to do on DomainMatrix than one person could
do in a single GSOC project so there could be two people doing related
projects on this although we do have limits on how many projects a
single mentor could supervise. I am not the only person who could
mentor a project on DomainMatrix though and it's also possible that I
would mentor a project on something else instead such as ODEs.

We will look at all applications and decide which to accept based on a
number of factors. The two major factors are the priority of the
project and the quality of the application. I think that DomainMatrix
is high priority but if there are stronger proposals for other
projects then it is possible that no proposal for DomainMatrix would
be accepted.

I suggest not to overthink this. The main thing is just to have a good
application that is well motivated and shows clear understanding of
the work to be done.

--
Oscar

Kartik Sethi

unread,
Mar 12, 2021, 1:14:49 AM3/12/21
to sympy
Oscar Benjamin, could you review this first draft of my proposal. Please let me know if you feel that I should focus 
on some other topics/ algorithms in DomainMatrix instead of the ones chosen by me. 

Kartik Sethi

unread,
Mar 14, 2021, 9:16:15 AM3/14/21
to sympy
Oscar Benjamin, Have you had a chance to review my GSoC application draft? 

S.Y. Lee

unread,
Mar 20, 2021, 8:23:46 AM3/20/21
to sympy
Mind me jumping in the discussion for this, but I have some comments

The important aspect of DomainMatrix is that the computations should be 'rational' (which can be generalized up to 'multivariate rational function field')
So I don't think that you can implement cholesky or hessenberg or householder easily.
I have also though about adjoining the new square roots as algebraic extensions,
But I'm afraid that you can't anticipate the speed just like numeric cases even if this approach is taken.

Kartik Sethi

unread,
Mar 20, 2021, 8:56:07 AM3/20/21
to sympy
>  Mind me jumping in the discussion for this, but I have some comments
Not at all.
S.Y. Lee what do you suggest I work on instead of these decompositions. I have added some fraction-free algorithms which were listed on #20987. I could not find any paper/website for 
Paterson-Stockmayer inverse so I went with decompositions instead. 

I am open to any suggestions 
Message has been deleted

Oscar Benjamin

unread,
Mar 21, 2021, 5:53:48 AM3/21/21
to sympy
On Sun, 21 Mar 2021 at 07:48, Kartik Sethi <kartik...@gmail.com> wrote:
>
> S.Y. Lee, in #20987 It is outlined that there is a need to implement a DomainScalar class, which would help speed up eigenvector computation.
> I think that would be a more fruitful endeavour instead of working on these decompositions.

The eigenvector calculation is already speeded up on master although
it's possible that further improvements can be made. What DomainScalar
would potentially do is make it easier to write the code for something
like that which would be useful for extending the same approach
elsewhere e.g. to use DomainMatrix for the Jordan normal form which is
a very similar calculation to the eigenvector calculation.


OScar

Kartik Sethi

unread,
Mar 21, 2021, 6:51:04 AM3/21/21
to sympy
There is already a PR #21120 which is trying to implement DomainScalar class 
which is why I deleted that message. Should I add Jordan Normal Form to my 
Proposal instead assuming that DomainScalar would be implemented by then or should 
I add improvements to DomainScalar as one of the objectives in my proposal.

I am a little confused about this.

Kartik 

Oscar Benjamin

unread,
Mar 21, 2021, 7:15:10 AM3/21/21
to sympy
On Sun, 21 Mar 2021 at 10:51, Kartik Sethi <kartik...@gmail.com> wrote:
>
> There is already a PR #21120 which is trying to implement DomainScalar class
> which is why I deleted that message.

Personally I read these messages as an email mailing list so deleting
a message is unnoticeable to me (you can't delete an email from my
inbox).

> Should I add Jordan Normal Form to my
> Proposal instead assuming that DomainScalar would be implemented by then or should
> I add improvements to DomainScalar as one of the objectives in my proposal.

I think that improving Jordan form and matrix exponential is
definitely something to do since currently the main usage of
DomainMatrix outside of linsolve is eigenvects and it clearly shows a
big improvement there. The Jordan form calculation is very similar and
is actually used more widely e.g. for computing the matrix exponential
and for solving systems of ODEs so applying the same improvement there
should be a high priority for DomainMatrix.

> I am a little confused about this.

The most important thing for a GSOC proposal is to demonstrate that:

1) You understand the parts of the codebase you are proposing to work
on and have ideas to improve things.
2) There are substantial improvements to be made that are of value to sympy.
3) You individually are capable of implementing *some* of those
substantial improvements.

The exact details of what you would do if your proposal was accepted
are less important than showing that you can make substantial
improvements in general. It's very likely that if any proposal based
on DomainMatrix is accepted then the details of what would get
implemented would end up different from any proposal made now because
part of the work is in identifying what should be done.

Oscar

Kartik Sethi

unread,
Mar 22, 2021, 5:30:02 AM3/22/21
to sympy
> The exact details of what you would do if your proposal was accepted
> are less important than showing that you can make substantial
> improvements in general. It's very likely that if any proposal based
> on DomainMatrix is accepted then the details of what would get
> implemented would end up different from any proposal made now because
> part of the work is in identifying what should be done.

Thanks for clarifying this. I am reasonably familiar with Jordan Canonical Form and 
Matrix exponential and I think I can implement them. I'll add these to my proposal instead
of QR and Cholesky Decompositions.

Thanks

Kartik
Reply all
Reply to author
Forward
0 new messages