error slicing multi-Index with dates

946 views
Skip to first unread message

Clem Bur

unread,
Dec 8, 2015, 1:40:44 PM12/8/15
to PyData
I moved to pandas version 0.17 from 0.13.1 and I get some unexpected errors on slicing.
Basically if you try to slice a multi-index containing dates, you need your boundaries to be contain in the index... weird behavior.

ex:
>>> df
         date  int  data
0  2014-01-01    0     0
1  2014-01-02    1    -1
2  2014-01-03    2    -2
3  2014-01-04    3    -3
4  2014-01-05    4    -4
5  2014-01-06    5    -5
>>> df.set_index("date").ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]
            int  data
date                 
2014-01-01    0     0
2014-01-02    1    -1
2014-01-03    2    -2
>>> df.set_index(["date","int"]).ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]
Traceback (most recent call last):
...
TypeError: Level type mismatch: 2013-12-30

Thanks!

Denis Akhiyarov

unread,
Dec 9, 2015, 12:07:41 PM12/9/15
to PyData
this works:

.loc[(slice(datetime.date(2013,12,30),datetime.date(2014,1,3)),slice(None)),:]


I looked up in this SO question:

Denis Akhiyarov

unread,
Dec 9, 2015, 12:13:03 PM12/9/15
to PyData
actually there is open issue in github exactly for this behavior and error:

Clem Bur

unread,
Dec 9, 2015, 11:12:33 PM12/9/15
to PyData
Thx Denis, Funny thing is it's working on 0.13 and breaks on later version (at least 0.17). Seems like gitHub issue is for 0.12.

Joris Van den Bossche

unread,
Dec 10, 2015, 7:28:50 AM12/10/15
to PyData
Are you sure this worked in 0.13.1?
I get exactly the same error for that version:

In [16]: pd.__version__
Out[16]: '0.13.1'

In [17]: df = pd.DataFrame({'date':pd.date_range('2014-01-01', periods=5), 'int':range(5), 'data':np.arange(0,1,0.2)})

In [18]: df2 = df.set_index(['date', 'int'])

In [19]: df2
Out[19]:
                data
date       int
2014-01-01 0     0.0
2014-01-02 1     0.2
2014-01-03 2     0.4
2014-01-04 3     0.6
2014-01-05 4     0.8

[5 rows x 1 columns]

In [20]: df2.ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]

...
TypeError: Level type mismatch: 2013-12-30


But, it works if you use `datetime` instead of `date`:

In [23]: df2.ix[datetime.datetime(2013,12,30):datetime.datetime(2014,1,3)]
Out[23]:
                data
date       int
2014-01-01 0     0.0
2014-01-02 1     0.2
2014-01-03 2     0.4

[3 rows x 1 columns]


But this behaviour is the same for 0.13.1 or for 0.17.1.
It is the 'parsing' of non-datetime objects (strings, dates) in slicing of muti-indexes that is not implemented (the issue Denis linked to).

Regards,
Joris


2015-12-10 5:12 GMT+01:00 Clem Bur <cbur...@gmail.com>:
Thx Denis, Funny thing is it's working on 0.13 and breaks on later version (at least 0.17). Seems like gitHub issue is for 0.12.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Clem Bur

unread,
Jan 29, 2016, 3:58:39 AM1/29/16
to PyData
Hi Joris,

sorry for the super late reply, I miss your answer until now.
Actually your example is using timestamp instead of datetime.date :
>>> df = pd.DataFrame({'date':pd.date_range('2014-01-01', periods=5), 'int':range(5), 'data':np.arange(0,1,0.2)})
>>> df.date.ix[0]
Timestamp('2014-01-01 00:00:00', tz=None)
if you use :
>>> date_list = [datetime.date(2014,01,01) + datetime.timedelta(days=x) for x in range(0, 5)]
>>> df = pd.DataFrame({'date':date_list, 'int':range(5), 'data':np.arange(0,1,0.2)})
>>> df.date.ix[0]
datetime.date(2014, 1, 1)

and with date I get :

>>> df2 = df.set_index(['date', 'int'])
>>> df2.ix[datetime.date(2013,12,30):datetime.date(2014,1,3)]

                data
date       int     
2014-01-01 0     0.0
2014-01-02 1     0.2
2014-01-03 2     0.4
[3 rows x 1 columns]

cheers
Clement

Clem Bur

unread,
Feb 1, 2016, 5:16:00 AM2/1/16
to PyData
So actually I came across another issue maybe related, indexing does not work properly either on pandas 0.17.1  using date :

 >>> date_list = [datetime.date(2014,01,01) + datetime.timedelta(days=x) for x in range(0, 5)]
 >>> df = pd.DataFrame({'date':date_list, 'int':range(5), 'data':np.arange(0,1,0.2)})
 >>> df.set_index(["date","int"]).loc[datetime.date(2014,01,01),0]
...
 KeyError: 'the label [0] is not in the [columns]'

 but :
 >>> df.set_index(["date","int"]).loc[datetime.date(2014,01,01)].loc[0]
data    0
Name: 0, dtype: float64

replacing datetime.date by datetime.datetime or string will work normally.

Cheers
Clement

Reply all
Reply to author
Forward
0 new messages