dataframe.set_index(inplace=True) ate my data :(

124 views
Skip to first unread message

gbadge

unread,
Nov 16, 2012, 8:32:39 PM11/16/12
to pyd...@googlegroups.com
So I think I found a bug that I stumbled upon when running up against the bug fixed in #1777 [hadn't realized was working on old 0.8.2 virtualenv].

So on mac osx, python 2.7, pandas 0.8.2.

I create a data frame from a list of tuples with (datetime, variable_value)

In [245]: out[1]
Out[245]: 
(datetime.datetime(2012, 5, 31, 23, 43, 56, tzinfo=<bson.tz_util.FixedOffset object at 0x105f72a10>),
 u'state')

I transform it into a data frame, do some renaming: 
In [246]: mdf = pd.DataFrame(out)
In [247]: mdf
Out[247]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27884 entries, 0 to 27883
Data columns:
0    27884  non-null values
1    27884  non-null values
dtypes: object(2)
In [249]: mdf = mdf.rename(columns={0:'date',1:'state'})
In [250]: mdf
Out[250]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27884 entries, 0 to 27883
Data columns:
date     27884  non-null values
state    27884  non-null values
dtypes: object(2)



Here comes the bug: 
In [251]: mdf.set_index('date',inplace=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-251-b6fac76e080b> in <module>()
----> 1 mdf.set_index('date',inplace=True)
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, append, inplace, verify_integrity)
   2324             arrays.append(level)
   2325 
-> 2326         index = MultiIndex.from_arrays(arrays, names=names)
   2327 
   2328         if verify_integrity and not index.is_unique:
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
   1528         if len(arrays) == 1:
   1529             name = None if names is None else names[0]
-> 1530             return Index(arrays[0], name=name)
   1531 
   1532         cats = [Categorical.from_array(arr) for arr in arrays]
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
    109             if _shouldbe_timestamp(subarr):
    110                 from pandas.tseries.index import DatetimeIndex
--> 111                 return DatetimeIndex(subarr, copy=copy, name=name)
    112 
    113             if lib.is_period_array(subarr):
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, **kwds)
    214                 subarr = data.view(_NS_DTYPE)
    215         else:
--> 216             subarr = tools.to_datetime(data)
    217             if not np.issubdtype(subarr.dtype, np.datetime64):
    218                 raise TypeError('Unable to convert %s to datetime dtype'
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/tseries/tools.pyc in to_datetime(arg, errors, dayfirst, utc, box)
     97                                        raise_=errors == 'raise',
     98                                        utc=utc,
---> 99                                        dayfirst=dayfirst)
    100         if com.is_datetime64_dtype(result) and box:
    101             result = DatetimeIndex(result, tz='utc' if utc else None)
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.array_to_datetime (pandas/src/tseries.c:33946)()
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

Great. I run into this issue that is addressed in #1777. That's fine. I change over to my 0.9 branch and everything is fine. Again, I am not concerned with the error message I am getting here. But this is what happens:

In [253]: mdf
Out[253]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27884 entries, 0 to 27883
Data columns:
state    27884  non-null values
dtypes: object(1)

Wah! My date information disappears. Not sure if people have brought this up before. I guess one solution would be "don't use inplace=True until you know it will work" but...feels a little funky that even though I throw an error my data gets eaten.  
Reply all
Reply to author
Forward
0 new messages