So I think I found a bug that I stumbled upon when running up against the bug fixed in #1777 [hadn't realized was working on old 0.8.2 virtualenv].
So on mac osx, python 2.7, pandas 0.8.2.
I create a data frame from a list of tuples with (datetime, variable_value)
In [245]: out[1]
Out[245]:
(datetime.datetime(2012, 5, 31, 23, 43, 56, tzinfo=<bson.tz_util.FixedOffset object at 0x105f72a10>),
u'state')
I transform it into a data frame, do some renaming:
In [246]: mdf = pd.DataFrame(out)
In [247]: mdf
Out[247]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27884 entries, 0 to 27883
Data columns:
0 27884 non-null values
1 27884 non-null values
dtypes: object(2)
In [249]: mdf = mdf.rename(columns={0:'date',1:'state'})
In [250]: mdf
Out[250]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27884 entries, 0 to 27883
Data columns:
date 27884 non-null values
state 27884 non-null values
dtypes: object(2)
Here comes the bug:
In [251]: mdf.set_index('date',inplace=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-251-b6fac76e080b> in <module>()
----> 1 mdf.set_index('date',inplace=True)
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, append, inplace, verify_integrity)
2324 arrays.append(level)
2325
-> 2326 index = MultiIndex.from_arrays(arrays, names=names)
2327
2328 if verify_integrity and not index.is_unique:
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
1528 if len(arrays) == 1:
1529 name = None if names is None else names[0]
-> 1530 return Index(arrays[0], name=name)
1531
1532 cats = [Categorical.from_array(arr) for arr in arrays]
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
109 if _shouldbe_timestamp(subarr):
110 from pandas.tseries.index import DatetimeIndex
--> 111 return DatetimeIndex(subarr, copy=copy, name=name)
112
113 if lib.is_period_array(subarr):
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, **kwds)
214 subarr = data.view(_NS_DTYPE)
215 else:
--> 216 subarr = tools.to_datetime(data)
217 if not np.issubdtype(subarr.dtype, np.datetime64):
218 raise TypeError('Unable to convert %s to datetime dtype'
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/tseries/tools.pyc in to_datetime(arg, errors, dayfirst, utc, box)
97 raise_=errors == 'raise',
98 utc=utc,
---> 99 dayfirst=dayfirst)
100 if com.is_datetime64_dtype(result) and box:
101 result = DatetimeIndex(result, tz='utc' if utc else None)
/Users/grayson/.virtualenvs/pandas-0.8.2-dev/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.array_to_datetime (pandas/src/tseries.c:33946)()
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Great. I run into this issue that is addressed in #1777. That's fine. I change over to my 0.9 branch and everything is fine. Again, I am not concerned with the error message I am getting here. But this is what happens:
In [253]: mdf
Out[253]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 27884 entries, 0 to 27883
Data columns:
state 27884 non-null values
dtypes: object(1)
Wah! My date information disappears. Not sure if people have brought this up before. I guess one solution would be "don't use inplace=True until you know it will work" but...feels a little funky that even though I throw an error my data gets eaten.