error slicing multi-Index with dates

Clem Bur

unread,

Dec 8, 2015, 1:40:44 PM12/8/15

to PyData

I moved to pandas version 0.17 from 0.13.1 and I get some unexpected errors on slicing.

Basically if you try to slice a multi-index containing dates, you need your boundaries to be contain in the index... weird behavior.

ex:

>>> df
         date  int  data
0  2014-01-01    0     0
1  2014-01-02    1    -1
2  2014-01-03    2    -2
3  2014-01-04    3    -3
4  2014-01-05    4    -4
5  2014-01-06    5    -5
>>> df.set_index("date").ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]
            int  data
date                 
2014-01-01    0     0
2014-01-02    1    -1
2014-01-03    2    -2
>>> df.set_index(["date","int"]).ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]
Traceback (most recent call last):
...
TypeError: Level type mismatch: 2013-12-30

Thanks!

Denis Akhiyarov

unread,

Dec 9, 2015, 12:07:41 PM12/9/15

to PyData

this works:

.loc[(slice(datetime.date(2013,12,30),datetime.date(2014,1,3)),slice(None)),:]

I looked up in this SO question:

http://stackoverflow.com/questions/24152509/slicing-a-pandas-multiindex-using-datetime-datatype

Denis Akhiyarov

unread,

Dec 9, 2015, 12:13:03 PM12/9/15

to PyData

actually there is open issue in github exactly for this behavior and error:

https://github.com/pydata/pandas/issues/3843

Clem Bur

unread,

Dec 9, 2015, 11:12:33 PM12/9/15

to PyData

Thx Denis, Funny thing is it's working on 0.13 and breaks on later version (at least 0.17). Seems like gitHub issue is for 0.12.

Joris Van den Bossche

unread,

Dec 10, 2015, 7:28:50 AM12/10/15

to PyData

Are you sure this worked in 0.13.1?

I get exactly the same error for that version:

In [16]: pd.__version__
Out[16]: '0.13.1'

In [17]: df = pd.DataFrame({'date':pd.date_range('2014-01-01', periods=5), 'int':range(5), 'data':np.arange(0,1,0.2)})

In [18]: df2 = df.set_index(['date', 'int'])

In [19]: df2
Out[19]:
                data
date       int
2014-01-01 0     0.0
2014-01-02 1     0.2
2014-01-03 2     0.4
2014-01-04 3     0.6
2014-01-05 4     0.8

[5 rows x 1 columns]

In [20]: df2.ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]

...
TypeError: Level type mismatch: 2013-12-30

But, it works if you use `datetime` instead of `date`:

In [23]: df2.ix[datetime.datetime(2013,12,30):datetime.datetime(2014,1,3)]
Out[23]:
                data
date       int
2014-01-01 0     0.0
2014-01-02 1     0.2
2014-01-03 2     0.4

[3 rows x 1 columns]

But this behaviour is the same for 0.13.1 or for 0.17.1.

It is the 'parsing' of non-datetime objects (strings, dates) in slicing of muti-indexes that is not implemented (the issue Denis linked to).

Regards,

Joris

2015-12-10 5:12 GMT+01:00 Clem Bur <cbur...@gmail.com>:

Thx Denis, Funny thing is it's working on 0.13 and breaks on later version (at least 0.17). Seems like gitHub issue is for 0.12.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Clem Bur

unread,

Jan 29, 2016, 3:58:39 AM1/29/16

to PyData

Hi Joris,

sorry for the super late reply, I miss your answer until now.

Actually your example is using timestamp instead of datetime.date :

>>> df = pd.DataFrame({'date':pd.date_range('2014-01-01', periods=5), 'int':range(5), 'data':np.arange(0,1,0.2)})

>>> df.date.ix[0]
Timestamp('2014-01-01 00:00:00', tz=None)

if you use :

>>> date_list = [datetime.date(2014,01,01) + datetime.timedelta(days=x) for x in range(0, 5)]
>>> df = pd.DataFrame({'date':date_list, 'int':range(5), 'data':np.arange(0,1,0.2)})
>>> df.date.ix[0]
datetime.date(2014, 1, 1)

and with date I get :

>>> df2 = df.set_index(['date', 'int'])

>>> df2.ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]

                data
date       int
2014-01-01 0     0.0
2014-01-02 1     0.2
2014-01-03 2     0.4

[3 rows x 1 columns]

cheers

Clement

Clem Bur

unread,

Feb 1, 2016, 5:16:00 AM2/1/16

to PyData

So actually I came across another issue maybe related, indexing does not work properly either on pandas 0.17.1 using date :

>>> date_list = [datetime.date(2014,01,01) + datetime.timedelta(days=x) for x in range(0, 5)]
>>> df = pd.DataFrame({'date':date_list, 'int':range(5), 'data':np.arange(0,1,0.2)})

>>> df.set_index(["date","int"]).loc[datetime.date(2014,01,01),0]

...

KeyError: 'the label [0] is not in the [columns]'

but :

>>> df.set_index(["date","int"]).loc[datetime.date(2014,01,01)].loc[0]

data 0
Name: 0, dtype: float64

replacing datetime.date by datetime.datetime or string will work normally.

Cheers

Clement

Reply all

Reply to author

Forward