SymPy CodeGen and BLAS/LAPACK

173 views
Skip to first unread message

Øyvind Jensen

unread,
Oct 25, 2012, 3:53:55 PM10/25/12
to sy...@googlegroups.com
Matthew wrote to me about his project to generate code that can call BLAS/LAPACK routines.
We decided to CC the mailing list:

---------- Forwarded message ----------


Hi Matthew,
That sounds like an exciting project! See my comments below.

On Wed, Oct 24, 2012 at 12:29 AM, Matthew Rocklin <@gmail.com> wrote:
> Hi Øyvind,
>
> I'm working on SymPy's matrix expressions module. I'd like to translate
> sympy matrix expressions like this
>
> X = MatrixSymbol('X', n, m)
> Y = MatrixSymbol('Y', n, m)
> expr = (X*X.T).I*Y
>
> with predicates likes this
>
> context = Q.invertible(X) & Q.positive_definite(Y)
>
> Into Fortran code that calls the relevant BLAS/LAPACK calls. This is a
> project of many parts (matrix expressions, logic programming, optimal
> selection of BLAS routines) and one of those parts is code generation.
>
> I see that you are the main contributor to the codegen module (see
> contribution counts below) and was wondering if I could ask you for your
> advice while I work on this.

Yes, you can :-)

>
> {"Vladimir Peric":3,"Ronan Lamy":1,"Joan Creus":1,"Siddhanathan
> Shanmugam":1,"Julio Idichekop Filho":1,"Aaron Meurer":1,"Oyvind
> Jensen":2,"Chris Smith":7,"Toon Verstraelen":6,"Øyvind Jensen":70}
>
> I've read through the source and docs once without going too deeply.
> I'm now wondering what I should read more deeply.

I would fire up a debugger and step through the code generetion for a
simple example. By reading and understanding all the code that is
executed, for say a matrix-vector product, you should get a good grasp
on it.

>
> I suspect I'll want to create Routines for each BLAS/LAPACK function, is
> this correct?
>
> Should I translate MatrixExprs to Indexed for better compatibility with
> codegen? How would I make codegen work natively with MatrixExprs?

I introduced Indexed exactly because I needed a Sympy object that
would map nicely into low level arrays with symbolic indexing. As far
as I know the codegen framework is not aware of MatrixExprs. But I am
not up to date on Sympy at the moment. It may be a good idea to see if
it is possible to merge Indexed with the other Matrix stuff.

Since you plan to call BLAS/LAPACK, you probably won't need to
generate loops(?). In that case, you don't actually need to use
Indexed.

>
> In one of your blogposts I noticed that you found Theano and made a
> conversion tool. The link on your post was broken. Does this still exist? I
> actively use both projects. Frederic (a core dev in Theano) and I actually
> made such a translation tool at the last SciPy conference. It appears we
> were just copying your work from years ago.

What I made was a simple routine that could map a Sympy tree into a
Theano-*like* tree, but I didn't actually populate the Theano-like
tree with Theano objects. So, your tool is probably way ahead of mine.
 I'll see if I can fix that link, though...

>
> I get the feeling I'm now thinking about the same problems that you thought
> about three years ago. If you are free to chat about this topic I think you
> could probably easily come up with solutions to many of my problems.
>
> Best,
> -Matthew

It would be cool if this can be implemented in such a way that a user
can provide custom rules for using external libraries. For instance,
if the codegen framework would accept user provided logic to determine
if a subexpression should be calculated with a given library call.
This kind of framework could also be used to enable codegen to
optimize common subexpressions.

If you don't mind, maybe we should CC the mailing list?

Cheers,
Øyvind

Øyvind Jensen

unread,
Oct 25, 2012, 3:58:01 PM10/25/12
to sy...@googlegroups.com

>> If you don't mind, maybe we should CC the mailing list?
>
> Go for it. I've been flooding the list recently which is why I didn't do
> this earlier.
>
> I've recently been working on a Theano-like graph for SymPy (a Computation
> object). I'm may be reinventing something. I'm working off of this branch
> https://github.com/mrocklin/sympy/tree/blas . What is there now will soon
> change however. I'm still trying to figure out the best way to do this.
>
> Question about codegen. I have to compose routines that have multiple
> out-params and inplace operations. Is SymPy codegen equipped to handle this?
> Does it have tools to reason about these sorts of problems? Maybe this will
> become clearer to me when I do the mat-vec multiply example.

Multiple output has  at least some support. In test_codegen.py you 'll find test_multiple_results_f(). What kind of inplace operations are you referring to?

I took a quick glance at your blas branch. I think that your Computation class is somewhat comparable with codegen.Routine, except that you are doing it properly. The Routine class aims to describe how a given mathematical expression can be programmed as a function or subroutine. But with the Computation class, you aim to describe also the relationship between several computations, isn't that right? It might be a good idea to remove or at least shrink the Routine class in favor of your stuff.

If you want codegen to support MatrixExpr objects, you would need to implement them in the code printers, and I guess you will also need to hack the constructor of Routine.

Cheers,
Øyvind

Matthew Rocklin

unread,
Oct 25, 2012, 4:09:33 PM10/25/12
to sy...@googlegroups.com
For those interested in following along I'm working off of my blas branch. Here is a link to the comparison master->blas


In particular I recommend looking at matrices/expressions/blas.py. This branch also includes my work on matrix assumptions (separate PR).

The goal is to create Fortran code from MatrixExprs. The work left to do can be decomposed into the following tasks

1. Write a good Computation class (the current structure isn't yet ideal)
2. Write down enough of BLAS to get going (close!)
3. Write down rules that translate MatrixExprs into BLAS computations (this is easier if we have good pattern matching)
4. Write down strategies to handle non-confluent rules (there are lots of possible solutions)
5. Use f2py to reconnect the Fortran code to numpy interface
6. Compare SymPy MatExpr times to NumPy times. I expect we'll be able to generate code that is several times faster than NumPy while still maintaining a nice interface.



Cheers,
Øyvind

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/9PvoswQHp6kJ.

To post to this group, send email to sy...@googlegroups.com.
To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sympy?hl=en.

Matthew Rocklin

unread,
Oct 28, 2012, 11:14:56 AM10/28/12
to sy...@googlegroups.com
Update:

I have basic code generation completed. I found that the Routine class didn't meet my needs to I built something new. There is some duplication. An example:

    # Mathematical Problem
    A = MatrixSymbol('A', n, n)
    B = MatrixSymbol('B', n, n)
    C = MatrixSymbol('C', n, n)
    x = MatrixSymbol('x', n, 1)
    context = (Q.lower_triangular(A) &
               Q.lower_triangular(B) &
               Q.lower_triangular(C))
    expr = (alpha*A*B + beta*C).I*x

    # Build a computation object from this expression
    # This should be replaced by a clever pattern matching/rules system
    gemm = GEMM(alpha, A, B, beta, C) # produces alpha*A*B + beta*C and stores result in C
    trsv = TRSV(alpha*A*B + beta*C, x) # computes (alpha*A*B+beta*C).I*x and stores result in x
    comp = MatrixRoutine((gemm, trsv), (alpha, A, B, beta, C, x), (expr,)) # a computation to do both gemm and trsv

    # build callable object
    f = comp.build(str, context) # Write fortran code (below) and compile with f2py
 
   # Use f
    import numpy as np
    nalpha, nbeta = 2.0, 3.0
    nA,nB,nC = [np.asarray([[1,0],[3,4]], order='F', dtype='float64')
            for i in range(3)]
    nx = np.asarray((2.3, 4.5), order='F', dtype='float64')

    f(nalpha, nA, nB, nbeta, nC, nx)

Matthew Rocklin

unread,
Oct 28, 2012, 11:25:26 AM10/28/12
to sy...@googlegroups.com
I pressed 'send' too early. 

This is the simplest non-trivial problem with which to use blas computations. The system is able to deal with more complex computations and predicates. Ideally we will be able to use rules/pattern matching to select specialized routines that match the predicates. I apologize about the short function names 'gemm', 'trsm', these are from BLAS. 

The goal of the project is to allow users to skip the middle step of building the computation. The first and last (math and use) are comprehensible to mathematical users, the second (gemm, trsm, etc...) is not. I think that we should be able to automate the second step.

The generated Fortran code is below. It mostly just calls BLAS. Note that this code is about as optimized as you will find. BLAS should easily beat numpy and most hand-written C (unless the hand-written C is very very optimized). I'll post timings in a bit. 

subroutine f(alpha, A, B, beta, C, x, n)

real*8, intent(in) :: A(n, n)
real*8, intent(in) :: B(n, n)
real*8, intent(inout) :: C(n, n)
real*8, intent(in) :: alpha
real*8, intent(in) :: beta
integer, intent(in) :: n
real*8, intent(inout) :: x(n, 1)

call dgemm('N', 'N', n, n, n, alpha, A, n, B, n, beta, C, n)
call dtrsv('L', 'N', 'N', n, C, n, x, 1)

RETURN
END

We now have three intermediate representations

1. MatrixExprs (MatrixSymbol, MatMul, Inverse, ...) 
2. MatrixComputations (gemm, trsv, symm, cholesky)
3. Fortran

We lack a compiler from (1) to (2). I hope that rules, pattern matching, and assumptions can perform this transformation. Once we have this (and once bugs in MatrixComputations are cleaned up) I think that we'll have a pretty awesome dense linear algebra library in SymPy.

Matthew Rocklin

unread,
Oct 29, 2012, 12:53:32 PM10/29/12
to sy...@googlegroups.com
I wrote this up more cleanly here

In it I describe a bit more background as well such as why BLAS is important and how NumPy can be improved. 

Aaron Meurer

unread,
Oct 29, 2012, 2:59:18 PM10/29/12
to sy...@googlegroups.com
Feel free to submit this as a pull request, even if it isn't ready to be merged. It will make it much easier to comment on the code that way. 

Aaron Meurer

Ondřej Čertík

unread,
Oct 29, 2012, 6:51:26 PM10/29/12
to sy...@googlegroups.com
On Mon, Oct 29, 2012 at 11:59 AM, Aaron Meurer <asme...@gmail.com> wrote:
> Feel free to submit this as a pull request, even if it isn't ready to be
> merged. It will make it much easier to comment on the code that way.

I was just about to suggest that.

Looking forward,
Ondrej

Matthew Rocklin

unread,
Oct 29, 2012, 7:57:53 PM10/29/12
to sy...@googlegroups.com
Unfortunately I don't currently have the time to respond to the detailed comments of a pull request. I'm in crunch mode right now. I'm happy to issue a PR if you like but I'll end up unsubscribing from the e-mail feed. I think it's better to wait until my priorities shift.

Large scale design questions and comments are really appreciated though. I think that this advice is better conveyed through blog-comments (I really welcome these) and e-mails though.

Best,
-Matt


Matthew Rocklin

unread,
Oct 31, 2012, 4:19:07 PM10/31/12
to sy...@googlegroups.com
Things are better now. I've issued a PR with a WIP label.

Matthew Rocklin

unread,
Oct 31, 2012, 4:19:15 PM10/31/12
to sy...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages