Advice on data structure for time series in 2D space

179 views
Skip to first unread message

Fernando Paolo

unread,
Feb 13, 2012, 10:33:59 PM2/13/12
to pystatsmodels
Hello,

I am working with a very large set of time series, where I have a 2D
grid representing space (x and y) and one time series per grid cell.
At the moment I store the data in a big 3D array (a cube) with the
vertical dimension being time. This is very convenient for applying
functions of the form f(t,x,y) to the whole data set. However, I
started using pandas containers to conveniently perform some time
series analysis. So I wonder if there is a convenient (better) way to
represent/estore this data set using pandas data structures instead of
plain numpy 3D array. Note that the spatial relationship between the
time series has to be preserve (i.e., the position in the original
grid), as well as the performance when applying functions to the whole
set.

Thanks a lot for any comment.

-fernando

Chris Rodgers

unread,
Feb 13, 2012, 11:16:40 PM2/13/12
to pystat...@googlegroups.com
One option is to use a Panel, with a separate DataFrame for each time point.

I like to stick with DataFrame whenever possible to keep things
simple, so you could also create separate columns X, Y, and T for
every datapoint. You can then use set_index(['X', Y', 'T']) or
set_index(['T', 'X', 'Y']) depending on whether you want to process
over timepoints or grid cells. I believe this will automatically sort
by the new indices too, to retain your meaningful data order.

To recover the 3d array representation, you would set the index in the
order you want, and then:
df.values.reshape((Ndim1, Ndim2, Ndim3)).

One caveat is that if your X, Y, and T values are floats, you should
be careful when you set them as indices. Certainly you can't count on
testing for equality with another floating point (eg: df[df.X == 0.1]
) due to rounding error. More importantly, I think that set_index
should work correctly (ie group .09 repeating with .1), but it's
something to check.

Chris

--
Graduate Student
Helen Wills Neuroscience Institute
University of California - Berkeley

Wes McKinney

unread,
Feb 23, 2012, 4:56:28 PM2/23/12
to pystat...@googlegroups.com

I would suggest using the Panel object, which is really just a 3-D
labeled array that is capable of having heterogeneous "slices" along
the first axis. I would be happy to get feedback about it if you find
that it does not suit your needs.

thanks,
Wes

Fernando Paolo

unread,
Mar 7, 2012, 4:11:49 PM3/7/12
to pystatsmodels
Thanks Chris and Wes for your suggestions! So I implemented my 3D data
structure using a Panel. The problem arise when I use `MultiIndex` for
the `items` axis of the Panel structure. Most of the (multi)indexing
capabilities shown in the documentation* don't work when using a Panel
structure.

* http://pandas.sourceforge.net/indexing.html

Example:

ind = pn.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b',
2)], names=['fist', 'second'])
wp = pn.Panel(np.random.random((4,5,5)), items=ind,
major_axis=np.arange(5), minor_axis=np.arange(5))

In [10]: wp['a']
...
KeyError: 'no item named a'

In [11]: wp.ix['a']
...
KeyError: 'no item named a'

Why the Panel structure doesn't follow the same behavior as DataFrame
and Series for MultiIndex? Or perhaps I'm missing something?

Thanks so much,

-fernando






On Feb 23, 1:56 pm, Wes McKinney <wesmck...@gmail.com> wrote:
> On Mon, Feb 13, 2012 at 11:16 PM, Chris Rodgers <xrodg...@gmail.com> wrote:
> > One option is to use a Panel, with a separate DataFrame for each time point.
>
> > I like to stick with DataFrame whenever possible to keep things
> > simple, so you could also create separate columns X, Y, and T for
> > every datapoint. You can then use set_index(['X', Y', 'T']) or
> > set_index(['T', 'X', 'Y']) depending on whether you want to process
> > over timepoints or grid cells. I believe this will automatically sort
> > by the new indices too, to retain your meaningful data order.
>
> > To recover the 3d array representation, you would set the index in the
> > order you want, and then:
> >    df.values.reshape((Ndim1, Ndim2, Ndim3)).
>
> > One caveat is that if your X, Y, and T values are floats, you should
> > be careful when you set them as indices. Certainly you can't count on
> > testing for equality with another floating point (eg: df[df.X == 0.1]
> > ) due to rounding error. More importantly, I think that set_index
> > should work correctly (ie group .09 repeating with .1), but it's
> > something to check.
>
> > Chris
>
> > --
> > Graduate Student
> > Helen Wills Neuroscience Institute
> > University of California - Berkeley
>

Adam Klein

unread,
Mar 8, 2012, 11:57:07 AM3/8/12
to pystat...@googlegroups.com
On Wed, Mar 7, 2012 at 4:11 PM, Fernando Paolo <fsp...@gmail.com> wrote:
Thanks Chris and Wes for your suggestions! So I implemented my 3D data
structure using a Panel. The problem arise when I use `MultiIndex` for
the `items` axis of the Panel structure. Most of the (multi)indexing
capabilities shown in the documentation* don't work when using a Panel
structure.

* http://pandas.sourceforge.net/indexing.html

Example:

ind = pn.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b',
2)], names=['fist', 'second'])
wp = pn.Panel(np.random.random((4,5,5)), items=ind,
major_axis=np.arange(5), minor_axis=np.arange(5))

In [10]: wp['a']
...
KeyError: 'no item named a'

In [11]: wp.ix['a']
...
KeyError: 'no item named a'

Why the Panel structure doesn't follow the same behavior as DataFrame
and Series for MultiIndex? Or perhaps I'm missing something?

Thanks so much,

You're probably not missing anything - the Panel structure is far less battle-tested than Series & DataFrame. Opening an issue:

Fernando Paolo

unread,
Mar 11, 2012, 7:43:57 PM3/11/12
to pystatsmodels
Hi Adam,

So I see you've closed the issue. Then if I'm not able to use any
indexing capability with the Panel structure, I obviously cannot use
pandas to implement what I need in my work.

Thanks - fernando


On Mar 8, 8:57 am, Adam Klein <a...@lambdafoundry.com> wrote:
> On Wed, Mar 7, 2012 at 4:11 PM, Fernando Paolo <fspa...@gmail.com> wrote:
> > Thanks Chris and Wes for your suggestions! So I implemented my 3D data
> > structure using a Panel. The problem arise when I use `MultiIndex` for
> > the `items` axis of the Panel structure. Most of the (multi)indexing
> > capabilities shown in the documentation* don't work when using a Panel
> > structure.
>
> > *http://pandas.sourceforge.net/indexing.html

Wes McKinney

unread,
Mar 14, 2012, 8:47:36 PM3/14/12
to pystat...@googlegroups.com

hi Fernando,

I believe you've misunderstood. Panel has been patched in git master

https://github.com/pydata/pandas/commit/f5e5b1427744724ab2e54faed2b4f973a22abf62
https://github.com/pydata/pandas/commit/764ce5e44f83ec2f9fa30895c6061b009e664429

so the example you gave works now (this will be released in pandas
0.7.2, upcoming):

In [2]: import pandas as pn

In [3]: paste


ind = pn.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b',
2)], names=['fist', 'second'])
wp = pn.Panel(np.random.random((4,5,5)), items=ind,
major_axis=np.arange(5), minor_axis=np.arange(5))

## -- End pasted text --

In [4]: wp['a']
Out[4]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major) x 5 (minor)
Items: 1 to 2
Major axis: 0 to 4
Minor axis: 0 to 4

However, generally the use of hierarchical indexing in Panel needs
more users and more bug reports-- parts of it work and parts of it do
not. There are too many granular tasks involved with this to be a
single issue which is why Adam closed the issue.

- Wes

Adam Klein

unread,
Mar 14, 2012, 9:03:04 PM3/14/12
to pystat...@googlegroups.com
Yes, I'm sorry for the confusion and not responding sooner ... I tried to address the problem you were having, and I closed the issue because I believed I had fixed it. I certainly hope you'll be able to use pandas for your work. If not, it'd be nice to know the problems you cannot solve.  Best, Adam
Reply all
Reply to author
Forward
0 new messages