Slicing a panel on a shared index of the items

121 views
Skip to first unread message

Iain

unread,
Mar 9, 2012, 4:47:38 PM3/9/12
to pystatsmodels
All

I have a Panel with x items. Each of the items is a dataframe with a
common index (a the major axis of the Panel) which is a series of
dates. I would like to slice/reindex the Panel along the major axis by
a date range, and return a smaller Panel with the same number of items
and the same number of columns, but fewer rows.

In other words, I want to get from this: Dimensions: 5 (items) x 100
(major) x 10 (minor)
To this: Dimensions: 5 (items) x 20 (major) x 10 (minor)

However, I just can't figure out how to do it. Panel.major_xs()
accepts a single element, not a range. Panel.ix[] operates on the item
axis, not the major (or minor) axis. There is no Panel.major_ix[]
method.

I'd be grateful if someone could lead me to the light.

Thanks
Iain

Wes McKinney

unread,
Mar 10, 2012, 12:55:35 PM3/10/12
to pystat...@googlegroups.com

try doing:

panel.ix[:, date1:date2, :]

note this will include date1 and date2 if they are in the major axis

- Wes

Iain

unread,
Mar 10, 2012, 6:58:50 PM3/10/12
to pystatsmodels
Wes

That's exactly what I needed, and very elegant. Perfect!

Thanks
Iain

On Mar 10, 12:55 pm, Wes McKinney <wesmck...@gmail.com> wrote:

Iain

unread,
Mar 21, 2012, 12:12:53 PM3/21/12
to pystatsmodels
Wes

I noticed that slicing ranges from a Panel along arbitrary axes using
Panel.ix[] is quite slow for large panels. For example, I have a Panel
called data:
<class 'pandas.core.panel.Panel'>
Dimensions: 10 (items) x 768432 (major) x 9 (minor)
Items: A to J
Major axis: 2009-12-27 22:59:00 to 2012-03-19 21:13:00
Minor axis: FOO to BAR

I slice by items (an array of 2 item names), start and end datetimes
(that define a range of time on the major axis), and columns (an array
of 2 column names from the minor axis) like so:
data.ix[items, start:end, columns]

This takes ~0.9 seconds by cProfile.run(), with the majority of the
time spent on pandas._engines.DictIndexEngine.get_loc() and
numpy.ndarray().copy():

477 function calls (475 primitive calls) in 0.878 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall
filename:lineno(function)
2 0.517 0.258 0.517 0.258 {method 'get_loc' of
'pandas._engines.DictIndexEngine' objects}
1 0.337 0.337 0.337 0.337 {method 'copy' of
'numpy.ndarray' objects}
1 0.022 0.022 0.022 0.022
{pandas._tseries.infer_dtype}
23 0.000 0.000 0.000 0.000
{numpy.core.multiarray.array}

Is this simply an unavoidable result of the size of my data Panel,
specifically the large numbers of rows (768,432)? Another thought -
would it be faster if I was using numpy.datetime64 instead of
datetime.datetimes in the index?

Thanks
Iain

Iain

unread,
Mar 21, 2012, 12:59:07 PM3/21/12
to pystatsmodels
To follow up:

I used timeit.Timer() to time this two different ways:
a) return a slice of a Panel using panel.ix[foo, bar, qux] (where foo,
bar, and qux are arrays)
b) i) iterate through the items of the panel that are in foo
ii) do a slice on each DataFrame with data.ix[bar, qux]
iii) save the slice
iv) reassemble the slices into a new, smaller Panel
v) return the new Panel

For the data I was using (10 items x 768432 major x 9 minor), the
results I got were:
t = timeit.Timer(a, setup=setup)
print min(t.repeat(repeat=3, number=10))
2.52914

t = timeit.Timer(b, setup=setup)
print min(t.repeat(repeat=3, number=10))
0.54757

This seems odd to me. Approach b is not only faster than a, but almost
5-fold faster. It appears that Panel.ix[foo, bar, qux] is doing
something even slower than the dumb approach embodied in B - either
that or I'm doing something horribly wrong! :-)

Iain

On Mar 21, 12:12 pm, Iain <iain.mcfad...@gmail.com> wrote:
> Wes
>

Wes McKinney

unread,
Mar 28, 2012, 11:21:17 PM3/28/12
to pystat...@googlegroups.com

Created a github issue here:

https://github.com/pydata/pandas/issues/979

Wes McKinney

unread,
Jul 12, 2012, 10:15:09 PM7/12/12
to pystat...@googlegroups.com
Hi Iain,

see


I was able to accelerate multi-axis selection from a panel by almost 5x, matching the speedup you described in your prior e-mail.

Best,
Wes
Reply all
Reply to author
Forward
0 new messages