Wes
I noticed that slicing ranges from a Panel along arbitrary axes using
Panel.ix[] is quite slow for large panels. For example, I have a Panel
called data:
<class 'pandas.core.panel.Panel'>
Dimensions: 10 (items) x 768432 (major) x 9 (minor)
Items: A to J
Major axis: 2009-12-27 22:59:00 to 2012-03-19 21:13:00
Minor axis: FOO to BAR
I slice by items (an array of 2 item names), start and end datetimes
(that define a range of time on the major axis), and columns (an array
of 2 column names from the minor axis) like so:
data.ix[items, start:end, columns]
This takes ~0.9 seconds by cProfile.run(), with the majority of the
time spent on pandas._engines.DictIndexEngine.get_loc() and
numpy.ndarray().copy():
477 function calls (475 primitive calls) in 0.878 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall
filename:lineno(function)
2 0.517 0.258 0.517 0.258 {method 'get_loc' of
'pandas._engines.DictIndexEngine' objects}
1 0.337 0.337 0.337 0.337 {method 'copy' of
'numpy.ndarray' objects}
1 0.022 0.022 0.022 0.022
{pandas._tseries.infer_dtype}
23 0.000 0.000 0.000 0.000
{numpy.core.multiarray.array}
Is this simply an unavoidable result of the size of my data Panel,
specifically the large numbers of rows (768,432)? Another thought -
would it be faster if I was using numpy.datetime64 instead of
datetime.datetimes in the index?
Thanks
Iain