[pandas] confused about datetime64 astype

2,170 views
Skip to first unread message

Aaron Curtis

unread,
Sep 21, 2013, 8:30:03 PM9/21/13
to pystat...@googlegroups.com
Hi all,

I'm trying to get my datetime64 astyped into a format that can be written to sqlite. If I try to write them as is I get

InterfaceError: Error binding parameter 2 - probably unsupported type.

The sql output has this string in it which is obviously not going to work: Timestamp('2013-07-15 00:00:03.374863', tz=None)

So, I tried converting to str:

In [96]: msgtbl['timestamp']=msgtbl.timestamp.astype(str)

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [|S0]
> /home/aaron/pandas/pandas/core/common.py(1824)_astype_nansafe()
   1823             raise TypeError(
-> 1824                 "cannot astype a datetimelike from [%s] to [%s]" % (arr.dtype, dtype))
   1825         return arr.astype(_NS_DTYPE)


And then I tried converting to object. This was the weird one, it didn't change at all, just stayed a datetime64:

In [99]: msgtbl
Out[99]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3519 entries, 0 to 3518
Data columns (total 3 columns):
msgType      3519  non-null values
flight_id    3519  non-null values
timestamp    3519  non-null values
dtypes: datetime64[ns](1), object(2)

In [100]: msgtbl['timestamp']=msgtbl.timestamp.astype(object)

In [101]: msgtbl
Out[101]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3519 entries, 0 to 3518
Data columns (total 3 columns):
msgType      3519  non-null values
flight_id    3519  non-null values
timestamp    3519  non-null values
dtypes: datetime64[ns](1), object(2)

What's going on? The only astype that seems to work is int, and I can't find an easy way to make that go in as an sqlite date type column.

Aaron

Jeff Reback

unread,
Sep 21, 2013, 9:29:56 PM9/21/13
to pystat...@googlegroups.com
try

pd.write_sql

or if u want to do it like u r doing

s.apply(lamba x: x.isoformat())

Aaron Curtis

unread,
Sep 22, 2013, 1:19:05 AM9/22/13
to pystat...@googlegroups.com
Thanks, the .isoformat did the trick. Interestingly, .astype(object).astype(str) also worked. I still don't get why .astype(object) leaves you with a datetime64. Is that a bug?

Jeff Reback

unread,
Sep 22, 2013, 10:16:32 AM9/22/13
to pystat...@googlegroups.com
not sure what you started with, astype(object) has been correctly correctly since as least 0.11 IIRC

In [1]: s = Series(date_range('20130101',periods=5))

In [2]: s
Out[2]: 
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
3   2013-01-04 00:00:00
4   2013-01-05 00:00:00
dtype: datetime64[ns]

In [3]: s.astype(object)
Out[3]: 
0    2013-01-01 00:00:00
1    2013-01-02 00:00:00
2    2013-01-03 00:00:00
3    2013-01-04 00:00:00
4    2013-01-05 00:00:00
dtype: object

In [4]: pd.__version__
Out[4]: '0.12.0'

In [5]: s.astype(object)[0]
Out[5]: datetime.datetime(2013, 1, 1, 0, 0)

Aaron Curtis

unread,
Sep 22, 2013, 11:39:30 AM9/22/13
to pystat...@googlegroups.com
I'm starting with a column of datetime64.

You're right, it is converting correctly to object, but the issue is that when I assign the column into a dataframe it converts it back to datetime64. Is that the expected behaviour?

But if I convert it to object and THEN to str (can't go straight to str) and write it back then this automatic conversion seems not to happen. Here's the example:

In [19]: msgtblTimeColAsObj=msgtbl.timestamp.astype(object)

In [20]: msgtbl.timestamp
Out[20]:
0    2013-07-15 00:00:03.194067
1    2013-07-15 00:00:03.374863
2    2013-07-15 00:00:03.535571
3    2013-07-15 00:00:03.716367
4    2013-07-15 00:00:03.867031
5    2013-07-15 00:00:04.007650
...
3514   2013-07-15 00:08:56.131147
3515   2013-07-15 00:08:56.281811
3516   2013-07-15 00:08:56.442519
3517   2013-07-15 00:08:56.593182
3518   2013-07-15 00:08:56.753890
Name: timestamp, Length: 3519, dtype: datetime64[ns]

In [21]: msgtblTimeColAsObj=msgtbl.timestamp.astype(object)

In [22]: msgtblTimeColAsObj
Out[22]:
0    2013-07-15 00:00:03.194067
1    2013-07-15 00:00:03.374863
2    2013-07-15 00:00:03.535571
3    2013-07-15 00:00:03.716367
4    2013-07-15 00:00:03.867031
...
3516   2013-07-15 00:08:56.442519
3517   2013-07-15 00:08:56.593182
3518   2013-07-15 00:08:56.753890
Name: timestamp, Length: 3519, dtype: object

In [23]: msgtbl.timestamp=msgtblTimeColAsObj

In [24]: msgtbl
Out[24]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3519 entries, 0 to 3518
Data columns (total 3 columns):
msgType      3519  non-null values
flight_id    3519  non-null values
timestamp    3519  non-null values
dtypes: datetime64[ns](1), object(2)


Aaron Curtis

unread,
Sep 22, 2013, 11:41:03 AM9/22/13
to pystat...@googlegroups.com
And the the str trick:

In [25]: msgtbl.timestamp=msgtblTimeColAsObj.astype(str)

In [26]: msgtbl
Out[26]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3519 entries, 0 to 3518
Data columns (total 3 columns):
msgType      3519  non-null values
flight_id    3519  non-null values
timestamp    3519  non-null values
dtypes: object(3)

Jeff Reback

unread,
Sep 22, 2013, 12:38:50 PM9/22/13
to pystat...@googlegroups.com
you can't have an object dtyped datetime in a frame since at least 0.10.1 IIRC - caused too many issues
a series is permissible though (mainly for the issue u are having which is output formatting)

0.13 will have new improved sql converters so this should all be a non issue in any event
Reply all
Reply to author
Forward
0 new messages