timedelta64 breaking all my code

2,203 views
Skip to first unread message

gbadge

unread,
May 21, 2013, 2:36:56 AM5/21/13
to pyd...@googlegroups.com
so i just upgraded to numpy 1.7.1 (I have gone back to 1.6.2 to fix my current issue...). All was going well until I started trying to deal with timedeltas in pandas Series/DataFrames. 
Think I am talking about https://github.com/pydata/pandas/issues/2315 -- but wanted to just document what I am going through incase its something new. 

Here is an example.

<code>
print end_date
122   2013-05-14 23:01:42
148   2013-05-14 23:08:56
150   2013-05-14 23:10:30
151   2013-05-14 23:00:30
152   2013-05-14 23:05:33
156   2013-05-14 23:02:40
157   2013-05-14 23:02:31
158   2013-05-14 23:02:43
159   2013-05-14 23:08:43
161   2013-05-14 23:05:55
162   2013-05-14 23:10:27
163   2013-05-14 23:09:07
165   2013-05-14 23:09:32
167   2013-05-14 23:00:47
168   2013-05-14 23:15:12
print start_date
122   2013-05-14 22:34:16
148   2013-05-14 22:41:30
150   2013-05-14 22:43:39
151   2013-05-14 22:43:41
152   2013-05-14 22:43:54
156   2013-05-14 22:45:25
157   2013-05-14 22:46:08
158   2013-05-14 22:46:35
159   2013-05-14 22:46:47
161   2013-05-14 22:47:12
162   2013-05-14 22:47:33
163   2013-05-14 22:47:39
165   2013-05-14 22:48:35
167   2013-05-14 22:48:55
168   2013-05-14 22:49:14
diff =  (end_date - start_date)
print diff
122   00:27:26
148   00:27:26
150   00:26:51
151   00:16:49
152   00:21:39
156   00:17:15
157   00:16:23
158   00:16:08
159   00:21:56
161   00:18:43
162   00:22:54
163   00:21:28
165   00:20:57
167   00:11:52
168   00:25:58
</code>

coool...everything feels fine. but now I try to get total seconds -- or days -- and my code explodes. 

<code>
diff.apply(lambda x: x.days)
*** AttributeError: 'numpy.timedelta64' object has no attribute 'days'
</code> 

Would love some direction on how to port my old code to something that will work w 1.7.1



 

Jeff

unread,
May 21, 2013, 7:31:14 AM5/21/13
to pyd...@googlegroups.com
numpy 1.7 changed the way timedelta64 work

1.6.2 are very similar to timedelta (as in datetime.timedelta), whereas in 1.7 they are essentially a 1-element integer array

they do the same thing but are different API's which are not back compat (gotta love numpy!)

Heres how in 1.6.2:
 
In [1]: s = pd.Series(pd.date_range('20130520 14:00:00',freq='s',periods=100))
In [2]: (s-s[5]).apply(lambda x: x.item().total_seconds())
Out[2]:
0    -5
1    -4
2    -3
3    -2
4    -1
5     0
6     1
7     2
8     3
9     4
10    5
11    6
12    7
13    8
14    9
...
85    80
86    81
87    82
88    83
89    84
90    85
91    86
92    87
93    88
94    89
95    90
96    91
97    92
98    93
99    94
Length: 100, dtype: float64
In [3]: np.__version__
Out[3]: '1.6.2'
In [4]: (s-s[5])[0].item() 
Out[4]: datetime.timedelta(-1, 86395)
In [5]: type((s-s[5])[0].item())
Out[5]: datetime.timedelta
 
and in 1.7
 
In [18]: s = pd.Series(pd.date_range('20130520 14:00:00',freq='s',periods=100))
In [19]: (s-s[5]).apply(lambda x: x.item()/1e9)
Out[19]:
0    -5
1    -4
2    -3
3    -2
4    -1
5     0
6     1
7     2
8     3
9     4
10    5
11    6
12    7
13    8
14    9
...
85    80
86    81
87    82
88    83
89    84
90    85
91    86
92    87
93    88
94    89
95    90
96    91
97    92
98    93
99    94
Length: 100, dtype: float64
In [20]: np.__version__
Out[20]: '1.7.0'
In [21]: (s-s[5])[0].item()                   
Out[21]: -5000000000L
In [22]: type((s-s[5])[0].item())
Out[22]: long
 
 
pandas tries to hide this but as we don't yet have a TimeDelta scalar type (analogous to Timestamp) it's a bit tricky

Jeff

unread,
May 21, 2013, 7:34:27 AM5/21/13
to pyd...@googlegroups.com
Here is a doc link (though I prob need to add an example of this, as its pretty common)
 

On Tuesday, May 21, 2013 2:36:56 AM UTC-4, gbadge wrote:

Jeff

unread,
May 21, 2013, 8:41:27 AM5/21/13
to pyd...@googlegroups.com
Here's a section from the new docs (not updated on the site yet)
 
 
Getting scalar results from a timedelta64[ns] series

In [160]: y = s - s[0]

In [161]: y
Out[161]:
0           00:00:00
1   1 days, 00:00:00
2   2 days, 00:00:00
dtype: timedelta64[ns]

In [162]: y.apply(lambda x: x.item().total_seconds())
Out[162]:
0         0
1     86400
2    172800
dtype: float64

In [163]: y.apply(lambda x: x.item().days)
Out[163]:
0    0
1    1
2    2
dtype: int64

Note These operations are different in numpy 1.6.2 and in numpy >= 1.7. The timedelta64[ns] scalar type in 1.6.2 is much like a datetime.timedelta, while in 1.7 it is a nanosecond based integer. A future version of pandas will make this transparent.
These are the equivalent operation to above in numpy >= 1.7

y.apply(lambda x: x.item()/np.timedelta64(1,'s'))

y.apply(lambda x: x.item()/np.timedelta64(1,'D'))
 
 
(you can divide by np.timedelta64(1,'s') rather than 1e9) in your case as well

Dave Hirschfeld

unread,
May 22, 2013, 9:11:27 AM5/22/13
to pyd...@googlegroups.com


On Wednesday, May 22, 2013 12:36:24 AM UTC+1, gbadge wrote:


Would love some direction on how to port my old code to something that will work w 1.7.1


A timedelta64[ns] array is just an int64 array representing the number of nanoseconds difference so to get the number of seconds you can just view it as an int64 array and divide by 1e9 - e.g.

 In [61]: pd.Series(dates - dates[0])
Out[61]: 
0           00:00:00
1   1 days, 00:00:00
2   2 days, 00:00:00
3   3 days, 00:00:00
4   4 days, 00:00:00
5   5 days, 00:00:00
6   6 days, 00:00:00
7   7 days, 00:00:00
8   8 days, 00:00:00
9   9 days, 00:00:00
dtype: timedelta64[ns]

In [65]: pd.Series(dates - dates[0]).view(np.int64)/1e9
Out[65]: 
0         0
1     86400
2    172800
3    259200
4    345600
5    432000
6    518400
7    604800
8    691200
9    777600
dtype: float64

Reply all
Reply to author
Forward
0 new messages