DataFrame.dropna also dropping frequency info

Peter Aberline

unread,

Aug 6, 2013, 8:02:07 AM8/6/13

to pyd...@googlegroups.com

Hi

I'm new to Pandas.

I'm seeing some behaviour I don't understand whereby calling DataFrame.dropna() is causing the frequency information to be lost from the DataFrame index.

Is this the expected behaviour? If so, could someone help me understand why it works like this? Or ways to put the frequency back on the index?

I've attached some example code and data which demonstrates.

Reproduced in Pandas: 0.12, and Numpy: 1.7.1

Thanks
Peter

df_a.p

df_b.p

merge_problem.py

Jeff

unread,

Aug 6, 2013, 9:41:26 AM8/6/13

to pyd...@googlegroups.com

This is by definition. If you create an irregular index, the freq = None, only in the case of a fully populated index is

the freq not None

You can certinaly reindex it be regular, but thats essentially the inverse dropna()

In [11]: s = Series(randn(10),index=date_range('20130101',periods=10))

In [12]: s.iloc[5:7] = np.nan

In [13]: s

Out[13]:

2013-01-01 0.916120

2013-01-02 2.394347

2013-01-03 -0.970764

2013-01-04 -0.792843

2013-01-05 -0.369776

2013-01-06 NaN

2013-01-07 NaN

2013-01-08 0.203474

2013-01-09 -0.697791

2013-01-10 -0.972242

Freq: D, dtype: float64

In [14]: s.dropna()

Out[14]:

2013-01-01 0.916120

2013-01-02 2.394347

2013-01-03 -0.970764

2013-01-04 -0.792843

2013-01-05 -0.369776

2013-01-08 0.203474

2013-01-09 -0.697791

2013-01-10 -0.972242

dtype: float64

Peter Aberline

unread,

Aug 6, 2013, 12:13:05 PM8/6/13

to pyd...@googlegroups.com

Thanks Jeff. I hadn't understood that the frequency specifier only applies on regular, and not irregular indices. Your example was very clear.

Thanks
Peter

Goyo

unread,

Aug 8, 2013, 5:18:11 AM8/8/13

to pyd...@googlegroups.com

El martes, 6 de agosto de 2013 18:13:05 UTC+2, Peter Aberline escribió:

Thanks Jeff. I hadn't understood that the frequency specifier only applies on regular, and not irregular indices. Your example was very clear.

Just in case you're not aware, that's not the case if you use a PeriodIndex:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: s = pd.Series(np.random.randn(10),index=pd.period_range('20130101',periods=10))

In [4]: s.iloc[5:7] = np.nan

In [5]: s.dropna()
Out[5]:
2013-01-01    0.873368
2013-01-02    0.286402
2013-01-03    0.022691
2013-01-04    0.501398
2013-01-05    1.250046
2013-01-08   -0.375172
2013-01-09    1.633512
2013-01-10    1.635336
Freq: D, dtype: float64

But the available frequencies for PeriodIndex are limited.

Regards

Goyo

Peter Aberline

unread,

Aug 8, 2013, 9:18:21 AM8/8/13

to pyd...@googlegroups.com

On Thursday, August 8, 2013 10:18:11 AM UTC+1, Goyo wrote:

Just in case you're not aware, that's not the case if you use a PeriodIndex:

Hi Goyo,
Thanks for the useful tip, I wasn't aware of that feature.

Best
Peter

Reply all

Reply to author

Forward