DataFrame.dropna also dropping frequency info

156 views
Skip to first unread message

Peter Aberline

unread,
Aug 6, 2013, 8:02:07 AM8/6/13
to pyd...@googlegroups.com
Hi

I'm new to Pandas.

I'm seeing some behaviour I don't understand whereby calling DataFrame.dropna() is causing the frequency information to be lost from the DataFrame index.

Is this the expected behaviour? If so, could someone help me understand why it works like this? Or ways to put the frequency back on the index?

I've attached some example code and data which demonstrates.

Reproduced in Pandas: 0.12, and Numpy: 1.7.1

Thanks
Peter
df_a.p
df_b.p
merge_problem.py

Jeff

unread,
Aug 6, 2013, 9:41:26 AM8/6/13
to pyd...@googlegroups.com
This is by definition. If you create an irregular index, the freq = None, only in the case of a fully populated index is
the freq not None
You can certinaly reindex it be regular, but thats essentially the inverse dropna()

In [11]: s = Series(randn(10),index=date_range('20130101',periods=10))

In [12]: s.iloc[5:7] = np.nan

In [13]: s
Out[13]: 
2013-01-01    0.916120
2013-01-02    2.394347
2013-01-03   -0.970764
2013-01-04   -0.792843
2013-01-05   -0.369776
2013-01-06         NaN
2013-01-07         NaN
2013-01-08    0.203474
2013-01-09   -0.697791
2013-01-10   -0.972242
Freq: D, dtype: float64

In [14]: s.dropna()
Out[14]: 
2013-01-01    0.916120
2013-01-02    2.394347
2013-01-03   -0.970764
2013-01-04   -0.792843
2013-01-05   -0.369776
2013-01-08    0.203474
2013-01-09   -0.697791
2013-01-10   -0.972242
dtype: float64

Peter Aberline

unread,
Aug 6, 2013, 12:13:05 PM8/6/13
to pyd...@googlegroups.com
Thanks Jeff. I hadn't understood that the frequency specifier only applies on regular, and not irregular indices. Your example was very clear.

Thanks
Peter


Goyo

unread,
Aug 8, 2013, 5:18:11 AM8/8/13
to pyd...@googlegroups.com
El martes, 6 de agosto de 2013 18:13:05 UTC+2, Peter Aberline escribió:
Thanks Jeff. I hadn't understood that the frequency specifier only applies on regular, and not irregular indices. Your example was very clear.


Just in case you're not aware, that's not the case if you use a PeriodIndex:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: s = pd.Series(np.random.randn(10),index=pd.period_range('20130101',periods=10))

In [4]: s.iloc[5:7] = np.nan

In [5]: s.dropna()
Out[5]:
2013-01-01    0.873368
2013-01-02    0.286402
2013-01-03    0.022691
2013-01-04    0.501398
2013-01-05    1.250046
2013-01-08   -0.375172
2013-01-09    1.633512
2013-01-10    1.635336
Freq: D, dtype: float64

But the available frequencies for PeriodIndex are limited.

Regards

Goyo

Peter Aberline

unread,
Aug 8, 2013, 9:18:21 AM8/8/13
to pyd...@googlegroups.com


On Thursday, August 8, 2013 10:18:11 AM UTC+1, Goyo wrote:
Just in case you're not aware, that's not the case if you use a PeriodIndex:


Hi Goyo,
Thanks for the useful tip, I wasn't aware of that feature.

Best
Peter
Reply all
Reply to author
Forward
0 new messages