Matrix Code Generation Opinion Thread

87 views
Skip to first unread message

James Crist

unread,
Aug 4, 2014, 6:27:38 PM8/4/14
to sy...@googlegroups.com
I'm working on adding support ofr codegeneration with Matrix objects. Currently an `indexed` type is supported that results in low-level contiguous arrays. These are always converted into loops though (and I don't really understand what they're for). In contrast, the intent here is to provide support for generating matrices that are comprised of sympy expressions (as seen frequently in `mechanics`). For example, the following should *work* after this update:

>>> mat = Matrix([sin(a), cos(b), tan(c)])
>>> func = autowrap(mat)
>>> func(1, 2, 3)
array([ 0.84147098, -0.41614684, -0.14254654])

I have some stuff working already, but have some question before I progress further.

1. Sympy Matrices are always 2 dimensional, should this be true of the generated code as well?

In numpy, the default is the minimum number of dimensions required. For example, `array([1, 2, 3])` has only one dimension. In contrast, with sympy `Matrix([1, 2, 3])` has 2 dimensions always (no way around). For expressions that are inherently column/row vectors should a single dimension array be created?

- Pros: Less nested data, many scipy routines require single dimension arrays only (odeint)
- Cons: Multiplication and indexing will be different. For this reason I'm kind of against switching, but I could be swayed. For example:

>>> np_a = array([1, 2, 3])
>>> sym_a = Matrix([1, 2, 3])
>>> np_a.dot(np_a.T)                # Perform Multiplication as done in numpy
14
>>> sym_a * sym_a.T               # Perform Multiplication as done in sympy
Matrix([
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> np_a_res = np_a.reshape((3,1))    # Reshape the array to be 2 dimensional
>>> np_a_res.dot(np_a_res.T)            # Multiply the 2 dimensional numpy arrays. This results in the same as sympy
array
([
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
>>> np_a[1, 0]
IndexError                                Traceback (most recent call last)
<ipython-input-265-9a4f4bee3353> in <module>()
----> 1 np_a[1,0]

IndexError: too many indices
>>> np_a_res[1,0]                            # Indexing works the same in 2d as it would in sympy normally
2



2. Should the default be numpy arrays?

I think yes. They are pretty much used everywhere in scientific python, and should be supported. numpy.matrix is on its way out, and should not be used in my opinion. Cython offers some support for generality, so that anything that offers the buffer protocol can be used. f2py may do the same, but I have little experience with it. Currently autowrap supports lists as well. I think this is silly, and results in some inefficiencies. As they're transformed into numpy.array internally in the call, the user must have numpy installed, and should be able to do this themselves.

3. For inputting matrices to functions, `MatrixSymbol`, or `DeferredVector`?

Sometimes you may have a lot of input variables, have those variables expressed as a vector of *known* length. Sympy offers two options for this: `MatrixSymbol` and `DeferredVector`. They both seem to offer the same intention, although `MatrixSymbol` seems to be used *way* more/have more functionality. Currently I'm only supporting `MatrixSymbol`. The idea is that this should be possible:

>>> x = MatrixSymbol('x', 3, 1) #Acts as a vector
>>> expr = x[1,1] + sin(x[2,1]) + cos(x[3,1])
>>> func = autowrap(expr)
>>> inp = np.array([1, 2, 3])
>>> func(inp)
0.9193049302252362

Note that this is the inverse of the issue described in #1. This time it's the function input that will have dimesion mismatch between what is being input (a single dimension numpy vector, compared to a 2 dimension sympy MatrixSymbol/Matrix. Again, the numpy array can be reshaped either externally or internally with the magic of cython (this won't result in a copy operation either, just a new memoryview). Still, I'd like an opinion on this.

4. What is an Indexed type for?

Currently I'm ignoring these, and allowing them to stay as they are while writing functionality around them for Matrix types. Still, the code supporting them is messy (maybe it has to be?) and I can't refactor it into something cleaner until I understand what they do. I've read the docs on this functionality and see that you can make a Matrix Mutliplication function. Great. But I plan on supporting that later on as well with MatrixExpr types and calls to blas routines. There has to be a purpose for these, I just don't understand it yet.

5. Would a ctypes CodeWrapper be wanted? (Not relevant immediately, but I'm curious)

Currently only Cython and f2py are supported for wrapper objects. these make very robust solutions, but neither is in the standard library. `ctypes` is a standard library module that provides some functionality of wrappers around shared libraries (.so or .dll). Not nearly as robust, and no built in type conversion. Pros: everyone has it, Cons: it's not as good. I think I'll add this in later.

If you have thoughts on any of these, please let me know! I want to make this useful for everyone, not just myself :)

Tim Lahey

unread,
Aug 4, 2014, 6:44:40 PM8/4/14
to sy...@googlegroups.com
I've answered your questions below.

On 4 Aug 2014, at 18:27, James Crist wrote:

>
> *1. Sympy Matrices are always 2 dimensional, should this be true of
> the
> generated code as well?*
>

I think the generated code should reflect the original object. So, if
it's a SymPy 2D Matrix, the generated code should be a 2D Matrix as
well. Changing the dimensionality can always be done on the generated
matrix as needed, but should only be done with the knowledge of the
user.

>
>
> *2. Should the default be numpy arrays?*
>

I can't see any reason why not. It's the most general choice.

>
> *3. For inputting matrices to functions, `MatrixSymbol`, or
> `DeferredVector`?*
>
>

I think going with MatrixSymbol is fine for now. If we need to add
support for DeferredVector, that can be added.

>
> *4. What is an Indexed type for?*

They're for representing tensors. Of course, the can be used for a
number of things, including calculating finite difference formulas. In
my case, tensors are useful for stress analysis, especially in changing
reference frames. Other uses I know about relate to various physics
topics.

I think there's some current work on the Indexed code, so refactoring it
at the moment isn't a good idea.

> *5. Would a ctypes CodeWrapper be wanted?* (Not relevant immediately,
> but
> I'm curious)

I think it's a good idea in general, but not really a high priority. As
something in the future, sure.

Cheers,

Tim.

Jason Moore

unread,
Aug 4, 2014, 7:13:42 PM8/4/14
to sy...@googlegroups.com
SymPy only has 2D arrays (matrices) and column vectors are still represented as 2D arrays (like Matlab). So I think we should stick to that paradigm. You could have a flag that says "make vectors 1D arrays" if you want to support that. But the C code in the backend, for example, is all based on 1D arrays. The indexing in C to nD arrays is just syntactic sugar. I suspect that is the same in fortran too.

Many numpy functions will broadcast so it may be the case that 2D column vectors will work in most cases, but otherwise the user may need to use .squeeze() to remove the unneeded dimension.

It makes sense to me to retain the same indexing in the numpy array as the sympy matrix.
 

2. Should the default be numpy arrays?

I think yes. They are pretty much used everywhere in scientific python, and should be supported. numpy.matrix is on its way out, and should not be used in my opinion. Cython offers some support for generality, so that anything that offers the buffer protocol can be used. f2py may do the same, but I have little experience with it. Currently autowrap supports lists as well. I think this is silly, and results in some inefficiencies. As they're transformed into numpy.array internally in the call, the user must have numpy installed, and should be able to do this themselves.

Yes, we should default to outputting numpy arrays and not support the numpy matrix type. The numpy matrix type is going to eventually be deprecated in numpy. No reason to use it in anything new.


3. For inputting matrices to functions, `MatrixSymbol`, or `DeferredVector`?

Sometimes you may have a lot of input variables, have those variables expressed as a vector of *known* length. Sympy offers two options for this: `MatrixSymbol` and `DeferredVector`. They both seem to offer the same intention, although `MatrixSymbol` seems to be used *way* more/have more functionality. Currently I'm only supporting `MatrixSymbol`. The idea is that this should be possible:

>>> x = MatrixSymbol('x', 3, 1) #Acts as a vector
>>> expr = x[1,1] + sin(x[2,1]) + cos(x[3,1])
>>> func = autowrap(expr)
>>> inp = np.array([1, 2, 3])
>>> func(inp)
0.9193049302252362

Note that this is the inverse of the issue described in #1. This time it's the function input that will have dimesion mismatch between what is being input (a single dimension numpy vector, compared to a 2 dimension sympy MatrixSymbol/Matrix. Again, the numpy array can be reshaped either externally or internally with the magic of cython (this won't result in a copy operation either, just a new memoryview). Still, I'd like an opinion on this.

I don't think this is really a mismatch. This operation seems more like you are treating the things in inp as a list of values. The output should be defined by expr, which in this case is a scalar. This looks fine.
 

4. What is an Indexed type for?

Currently I'm ignoring these, and allowing them to stay as they are while writing functionality around them for Matrix types. Still, the code supporting them is messy (maybe it has to be?) and I can't refactor it into something cleaner until I understand what they do. I've read the docs on this functionality and see that you can make a Matrix Mutliplication function. Great. But I plan on supporting that later on as well with MatrixExpr types and calls to blas routines. There has to be a purpose for these, I just don't understand it yet.

I think Björn Dahlgren has been working on this stuff. Maybe speaking with him could help clarify this. I don't really get it either.
 

5. Would a ctypes CodeWrapper be wanted? (Not relevant immediately, but I'm curious)

Currently only Cython and f2py are supported for wrapper objects. these make very robust solutions, but neither is in the standard library. `ctypes` is a standard library module that provides some functionality of wrappers around shared libraries (.so or .dll). Not nearly as robust, and no built in type conversion. Pros: everyone has it, Cons: it's not as good. I think I'll add this in later.

This could certainly be useful, especially if you are using SymPy without the optional dependencies. I think the autowrap module idea is to have a flexible framework that can use a variety of wrappers and language combinations. For cPython there are are several that are popular: cython, swig, ctypes, boost, etc. But the main goal would to have it working for the two currently supported combinations: C/Cython, Fortran/F2py.
 

If you have thoughts on any of these, please let me know! I want to make this useful for everyone, not just myself :)

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/b362375c-2caa-41ee-a176-841d14147f12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Rocklin

unread,
Aug 4, 2014, 7:24:11 PM8/4/14
to sy...@googlegroups.com
Thank you for working on this.  I think that this is important work.  

I would express my views by they happen to be completely in line with Tim's response which mostly says that your intuition on this problem seems sensible.

Thanks again,
-Matt


James Crist

unread,
Aug 5, 2014, 12:14:52 AM8/5/14
to sy...@googlegroups.com
@Tim:

They're for representing tensors. Of course, the can be used for a number of things, including calculating finite difference formulas. In my case, tensors are useful for stress analysis, especially in changing reference frames. Other uses I know about relate to various physics topics.

Oh neat! I took a class on finite elements, but we never referred to them as tensors. I'll have to look into that.

Have you used code generation with them? If you have and have some example code with them you could send my way that'd be great. It'd be really useful for me to understand exactly what the scope of support for them is, as I work around the implementation trying to get other things to work.

For example, would you ever create an Indexed that did an operation with a Matrix?  Or a Matrix with Indexed's as elements? What I'm trying to see is if there is any reason for them ever to coexist, or can the implementations assume that they're never used together? The tensor code makes everything really complicated as the loops introduce a scope  of sorts (symbol y[i] isn't the same value at each point in a loop).

@Jason:


But the C code in the backend, for example, is all based on 1D arrays. The indexing in C to nD arrays is just syntactic sugar. I suspect that is the same in fortran too.

Good point. gcc will error (I think, haven't tested) if we're not consistent with indexing (arr[2][1] vs arr[2]), but the underlying memory layout is the same, so python shouldn't care. If these are being used in an external C library they might, but then the user can cast the types or something else to get around that.


I don't think this is really a mismatch. This operation seems more like you are treating the things in inp as a list of values. The output should be defined by expr, which in this case is a scalar. This looks fine.

Related to above realization. Memory layout is the same, python shouldn't care.

@Matthew:

Thanks, I think a lot of people could really use this. What my main goal is right now is to get the basework down for *evaluating* single matrices with expressions as elements. After that I plan on working though correlating MatrixExpr with blas/lapack functionality so that these operations can be low-leveled as well. I watched your SciPy talk on that, it was really interesting! Little more than I'm interested in (or capable of implementing), but it's a great idea.


--
You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/zRytRaEH5R8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+un...@googlegroups.com.

To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.

Tim Lahey

unread,
Aug 5, 2014, 12:34:41 AM8/5/14
to sy...@googlegroups.com


On 5 Aug 2014, at 0:14, James Crist wrote:

> @Tim:
>
> They're for representing tensors. Of course, the can be used for a
> number
>> of things, including calculating finite difference formulas. In my
>> case,
>> tensors are useful for stress analysis, especially in changing
>> reference
>> frames. Other uses I know about relate to various physics topics.
>>
>
> Oh neat! I took a class on finite elements, but we never referred to
> them
> as tensors. I'll have to look into that.

Not really, finite differences are a different approach to solving PDEs.
With finite differences you have nodes at i-1, i, and i+1 (and possibly
in time or in multiple dimensions). You approximate the derivative with
differences in the values at those points. You can use Indexed to
represent these. As the mesh gets more fine, it's a closer
representation of the derivative as the formulas come from Taylor
series. I guess you could think of them as tensors, but it's not a
common thing. It's just that Indexed provides a useful way you can do
the representation.

The tensors are from continuum mechanics. The more sophisticated
analysis of material deformation all comes down to this, and tensors are
the usual way to represent things.

>
> Have you used code generation with them? If you have and have some
> example
> code with them you could send my way that'd be great. It'd be really
> useful
> for me to understand exactly what the scope of support for them is, as
> I
> work around the implementation trying to get other things to work.
>

No, I haven't. Sorry.

> For example, would you ever create an Indexed that did an operation
> with a
> Matrix? Or a Matrix with Indexed's as elements? What I'm trying to
> see is
> if there is any reason for them ever to coexist, or can the
> implementations
> assume that they're never used together? The tensor code makes
> everything
> really complicated as the loops introduce a scope of sorts (symbol
> y[i]
> isn't the same value at each point in a loop).

It's highly unlikely you'd create an Indexed that did an operation with
a Matrix, but a Matrix with Indexed elements, yes as this is what would
happen in a finite difference model.

Cheers,

Tim.

James Crist

unread,
Aug 5, 2014, 12:54:22 AM8/5/14
to sy...@googlegroups.com

It's highly unlikely you'd create an Indexed that did an operation with a Matrix, but a Matrix with Indexed elements, yes as this is what would happen in a finite difference model.

Would you write it as Matrix([indexed_thing, indexed_thing_2, ...]), or just use the indexed object support for the outer Matrix as is already shown in the docs?


--
You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/zRytRaEH5R8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+unsubscribe@googlegroups.com.

To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.

Tim Lahey

unread,
Aug 5, 2014, 1:10:44 AM8/5/14
to sy...@googlegroups.com


On 5 Aug 2014, at 0:54, James Crist wrote:

>> It's highly unlikely you'd create an Indexed that did an operation
>> with a
>> Matrix, but a Matrix with Indexed elements, yes as this is what would
>> happen in a finite difference model.
>>
>
> Would you write it as Matrix([indexed_thing, indexed_thing_2, ...]),
> or
> just use the indexed object support for the outer Matrix as is already
> shown in the docs?
>

It would be something like Matrix([A[i+1], -2*A[i], A[i-1], ...]). At
least for the linear case. It would be more complex for the non-linear
case. This is just a simple case, but you can get differences in 4
dimensions (3 space and 1 time), that's why it's not just matrix
entries. But there is a mapping between them.

My course text for finite differences was, Numerical Solution of Partial
Differential Equations by G.D. Smith. It's probably in the university
library. It doesn't get into more than 2 dimensions (two space or 1
space and 1 time), but it covers the basics pretty well including their
use for the solution of parabolic, hyperbolic, and elliptical PDEs.

Finite differences aren't used as much anymore since finite volume and
finite element methods have been developed, but they're easy to
implement and work for pretty much any type of problem. Maple's
numerical PDE solver uses finite differences internally last I checked.

Cheers,

Tim.

Matthew Rocklin

unread,
Aug 5, 2014, 10:48:09 AM8/5/14
to sy...@googlegroups.com
@Matthew:

Thanks, I think a lot of people could really use this. What my main goal is right now is to get the basework down for *evaluating* single matrices with expressions as elements. After that I plan on working though correlating MatrixExpr with blas/lapack functionality so that these operations can be low-leveled as well. I watched your SciPy talk on that, it was really interesting! Little more than I'm interested in (or capable of implementing), but it's a great idea.

A more practical version of what was laid out in that talk would be useful.  Currently my practical solution to that is to use Theano. The SymPy-Theano code generators translate sympy matrix expressions to theano array expressions nicely.  Of course, this requires that you have Theano installed. 

Jason Moore

unread,
Aug 5, 2014, 12:24:29 PM8/5/14
to sy...@googlegroups.com
Yes, the Theano bridge solves this problem in some sense, except that Theano is weak in speeding up the long scalar expression computations.

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.

Aaron Meurer

unread,
Aug 5, 2014, 12:46:42 PM8/5/14
to sy...@googlegroups.com


On Mon, Aug 4, 2014 at 5:44 PM, Tim Lahey <tim....@gmail.com> wrote:
> I've answered your questions below.
>
> On 4 Aug 2014, at 18:27, James Crist wrote:
>
>>
>> *1. Sympy Matrices are always 2 dimensional, should this be true of the
>> generated code as well?*
>>
>
> I think the generated code should reflect the original object. So, if it's a
> SymPy 2D Matrix, the generated code should be a 2D Matrix as well. Changing
> the dimensionality can always be done on the generated matrix as needed, but
> should only be done with the knowledge of the user.
>
>>
>>
>> *2. Should the default be numpy arrays?*
>>
>
> I can't see any reason why not. It's the most general choice.
>
>>
>> *3. For inputting matrices to functions, `MatrixSymbol`, or
>> `DeferredVector`?*
>>
>>
>
> I think going with MatrixSymbol is fine for now. If we need to add support
> for DeferredVector, that can be added.

I would stick with MatrixSymbol. It's the most supported. Although note that you may run into some issues because of https://github.com/sympy/sympy/issues/6249.


>>> x = MatrixSymbol('x', 3, 1) #Acts as a vector
>>> expr = x[1,1] + sin(x[2,1]) + cos(x[3,1])
>>> func = autowrap(expr)
>>> inp = np.array([1, 2, 3])
>>> func(inp)
0.9193049302252362

Do you mean it should be func(*inp)?
 
>
>>
>> *4. What is an Indexed type for?*
>
>
> They're for representing tensors. Of course, the can be used for a number of
> things, including calculating finite difference formulas. In my case,
> tensors are useful for stress analysis, especially in changing reference
> frames. Other uses I know about relate to various physics topics.

I'm not sure if I agree with that. The module is called tensor, but that's very misleading. For example, there are no upper or lower indices.  The new tensor module being worked on by FB and Mario is more what I would think of as tensors.

To me, Indexed is just an object used to represent an indexed array, so that it is possible to write in SymPy something that gets translated directly into array code in C or Fortran.  With that being said, it has also been used (with differing success) for other purposes. For example, sometimes people will want to write a summation with something like a_i, where i is the summation index, and they will use Indexed.


>
> I think there's some current work on the Indexed code, so refactoring it at
> the moment isn't a good idea.
>
>> *5. Would a ctypes CodeWrapper be wanted?* (Not relevant immediately, but
>> I'm curious)

That would be pretty neat, especially if it is relatively performant.

Aaron Meurer


>
>
> I think it's a good idea in general, but not really a high priority. As
> something in the future, sure.
>
> Cheers,
>
> Tim.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit

James Crist

unread,
Aug 5, 2014, 4:14:12 PM8/5/14
to sy...@googlegroups.com


>>> x = MatrixSymbol('x', 3, 1) #Acts as a vector
>>> expr = x[1,1] + sin(x[2,1]) + cos(x[3,1])
>>> func = autowrap(expr)
>>> inp = np.array([1, 2, 3])
>>> func(inp)
0.9193049302252362

Do you mean it should be func(*inp)?

No, I mean func(inp), where inp is a numpy/contiguous array. The function only has one parameter, which is a vector holding some number of values. This is a common way of expressing problems in dynamics simulation, where the vector holds your system state, and the function represents the derivative. Then it's in the exact form needed to pass to odeint or any other integration routine.


--
You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/zRytRaEH5R8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+un...@googlegroups.com.

To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
Reply all
Reply to author
Forward
0 new messages