[SciPy-user] Using savemat with (nested) NumPy record arrays?

428 views
Skip to first unread message

Christopher A Mejia

unread,
Apr 22, 2009, 12:50:17 PM4/22/09
to scipy...@scipy.org

Hi,

I'm trying to write a NumPy record array using the savemat function, using the format='5' default, but I am not having much success.  Here's an example using a NumPy record array defined in the NumPy User Guide:

-----------------------------------------

>>> import numpy as np
>>> x = np.zeros(3, dtype=[(’x’,’f4’),(’y’,np.float32),(’value’,’f4’,(2,2))])
SyntaxError: invalid syntax
>>> x = np.zeros(3, dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))])
>>> x
array([(0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
       (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
       (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]])],
      dtype=[('x', '<f4'), ('y', '<f4'), ('value', '<f4', (2, 2))])
>>> from scipy.io.matlab.mio import savemat
>>> savemat('record_array_test.mat', {'x': x})

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    savemat('record_array_test.mat', {'x': x})
  File "C:\Python25\lib\site-packages\scipy\io\matlab\mio.py", line 159, in savemat
    MW.put_variables(mdict)
  File "C:\Python25\lib\site-packages\scipy\io\matlab\mio5.py", line 974, in put_variables
    mat_writer.write()
  File "C:\Python25\lib\site-packages\scipy\io\matlab\mio5.py", line 736, in write
    self.arr = self.arr.astype('f8')
ValueError: setting an array element with a sequence.
>>>

-----------------------------------------

Actually, what I'd like to do is to be able to handle an arbitrarily nested record array, as in:

-----------------------------------------

>>> spam = np.zeros(2, dtype=[('a','f4'), ('b', [('x', 'f4'), ('y', 'f4', (2,2))])])
>>> spam
array([(0.0, (0.0, [[0.0, 0.0], [0.0, 0.0]])),
       (0.0, (0.0, [[0.0, 0.0], [0.0, 0.0]]))],
      dtype=[('a', '<f4'), ('b', [('x', '<f4'), ('y', '<f4', (2, 2))])])
>>> savemat('record_array_test2.mat', {'spam': spam})

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    savemat('record_array_test2.mat', {'spam': spam})
  File "C:\Python25\lib\site-packages\scipy\io\matlab\mio.py", line 159, in savemat
    MW.put_variables(mdict)
  File "C:\Python25\lib\site-packages\scipy\io\matlab\mio5.py", line 974, in put_variables
    mat_writer.write()
  File "C:\Python25\lib\site-packages\scipy\io\matlab\mio5.py", line 736, in write
    self.arr = self.arr.astype('f8')
ValueError: setting an array element with a sequence.

-----------------------------------------

As you can see, I get the same error for the nested case.  I know what I am trying to do is possible, because I can generate my desired nested structure array in MATLAB, then do a "round-trip" loadmat(,struct_as_record=True) and savemat() to get back the same thing in MATLAB.  However, I cannot seem to reverse engineer what loadmat(,struct_as_record=True) does to create the NumPy record array.  Two differences appear to be that the dtype definition created by loadmat(,struct_as_record=True) does not print out as being nested, it just shows a '|04' type (set by the keyword "object"); also scalars and one-dimensional vectors appear to be upconverted to 2-d matrices.  Perhaps someone has a routine that I can use to pre-process my nested record array so it works with savemat?

FYI, I'm using Python 2.5.4, NumPy 1.2.1 and SciPy 0.7.0.

Thanks in advance for any help,

--Chris

( P.S.  I apologize in advance if this post shows up twice...my first attempt seems to have gotten lost.)

Christopher A Mejia

unread,
Apr 22, 2009, 11:34:53 PM4/22/09
to SciPy Users List

Hi,

Well, it turns out I found a solution myself, so I'll share that, but I still need some further help...  What I found was:

1.  By declaring the dtype as "object" (no quotes) instead of a nested dtype, I got past the part where the Python call to savemat was breaking.
2.  However, I didn't get any data showing up in MATLAB after loading my file, unless I made sure that all of the arrays had a shape with at least one dimension.

Here is an example of code that works:

--------------------------------
>>> import numpy as np
>>> x = np.zeros((1,), dtype=[('a', object)])
>>> x[0]['a'] = np.zeros((1,))
>>> savemat('record_array_test3.mat', {'x': x})
--------------------------------

The problem I'm running into now is that the savemat function is too slow.  The top level of data I'm trying to save is an array of structures.  It seems that the time for savemat increases exponentially as the number of records in this structure array increases.  Is there a better way to organize the storage of data into savemat, or is there a simple way to modify savemat to speed it up?  Other approaches?  I'm trying to keep all of the "metadata" (i.e. field names) in Python available to MATLAB.  I got the field names into Python using SWIG and C++ code.  I've looked into PyTables but didn't like the way the tables loaded into MATLAB.

Thanks in advance,
--Chris



Christopher A Mejia <cam...@raytheon.com>
Sent by: scipy-use...@scipy.org

04/22/2009 09:51 AM

Please respond to
SciPy Users List <scipy...@scipy.org>

To
scipy...@scipy.org
cc
Subject
[SciPy-user] Using savemat with (nested) NumPy record arrays?



( P.S.  I apologize in advance if this post shows up twice...my first attempt seems to have gotten lost.)
_______________________________________________
SciPy-user mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user



Matthew Brett

unread,
Apr 23, 2009, 12:09:22 AM4/23/09
to SciPy Users List
Hi,

> 2.  However, I didn't get any data showing up in MATLAB after loading my
> file, unless I made sure that all of the arrays had a shape with at least
> one dimension.

There was a bug in scipy 0.7 release - do you have the latest SVN?
Does that work?

> The problem I'm running into now is that the savemat function is too slow.
>  The top level of data I'm trying to save is an array of structures.  It
> seems that the time for savemat increases exponentially as the number of
> records in this structure array increases.  Is there a better way to
> organize the storage of data into savemat, or is there a simple way to
> modify savemat to speed it up?

Yes, that's a known problem as well. The hope is that the current SVN
implementation is a lot faster. I'd be very glad of your feedback
about that...

Thanks,

Matthew

Christopher A Mejia

unread,
Apr 24, 2009, 9:46:56 AM4/24/09
to SciPy Users List

Matthew,

Yes and yes!  I downloaded the latest SVN, and savemat of NumPy scalars (shape==()) worked correctly, and savemat ran much faster.  A data set which did not complete in writing overnight (~16 hours) succesfully wrote in 4.5 minutes with the update.  Thank you!

I did notice that the files created by savemat are rather large; if I read it in MATLAB and write it back out, it is much smaller.  Maybe MATLAB is intelligent deciding when to compress the data?  Anyway, this is not a big deal and savemat certainly provides the functionality I need.

Another "nice to have" would be to upgrade savemat to work with nested dtypes, in addition to dtype=object.  I don't know enough about Python to know if this is simple or complicated, and as long as one knows that you need to set dtype=object instead of defining a nested dtype, again all of the functionality is there.

Thanks again,
--Chris



Matthew Brett <matthe...@gmail.com>
Sent by: scipy-use...@scipy.org

04/22/2009 09:09 PM

Please respond to
SciPy Users List <scipy...@scipy.org>

To
SciPy Users List <scipy...@scipy.org>
cc
Subject
Re: [SciPy-user] Using savemat with (nested) NumPy record arrays?


Reply all
Reply to author
Forward
0 new messages