[Numpy-discussion] Implicit conversion of python datetime to numpy datetime64?

1,312 views
Skip to first unread message

Benjamin Root

unread,
Feb 14, 2012, 11:17:53 PM2/14/12
to Discussion of Numerical Python
Just a thought I had.  Right now, I can pass a list of python ints or floats into np.array() and get a numpy array with a sensible dtype.  Is there any reason why we can't do the same for python's datetime?  Right now, it is very easy for me to make a list comprehension of datetime objects using strptime(), but it is very awkward to make a numpy array out of it.

The only barrier I can think of are those who have already built code around a object dtype array of datetime objects.

Thoughts?
Ben Root

P.S. - what ever happened to arange() and linspace() for datetime64?

Charles R Harris

unread,
Feb 15, 2012, 12:05:19 AM2/15/12
to Discussion of Numerical Python

Arange works in the development branch,

In [1]: arange(0,3,1, dtype="datetime64[D]")
Out[1]: array(['1970-01-01', '1970-01-02', '1970-01-03'], dtype='datetime64[D]')

but linspace is more complicated in that it might not be possible to subdivide an interval into reasonable datetime64 units

In [4]: a = datetime64(0, 'D')

In [5]: b = datetime64(1, 'D')

In [6]: linspace(a, b, 5)
Out[6]: array(['1970-01-01', '1970-01-01', '1970-01-01', '1970-01-01', '1970-01-02'], dtype='datetime64[D]')

Looks like a project for somebody. There is probably a lot of work along that line to be done.

Chuck

Mark Wiebe

unread,
Feb 15, 2012, 12:12:20 AM2/15/12
to Discussion of Numerical Python
On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root <ben....@ou.edu> wrote:
Just a thought I had.  Right now, I can pass a list of python ints or floats into np.array() and get a numpy array with a sensible dtype.  Is there any reason why we can't do the same for python's datetime?  Right now, it is very easy for me to make a list comprehension of datetime objects using strptime(), but it is very awkward to make a numpy array out of it.

I would consider this a bug, it's not behaving sensibly at present. Here's what it does for me:

In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12",

    ...: "07/22/98", "12/12/12"]], dtype="M8")

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

C:\Python27\Scripts\<ipython-input-20-d3b7b5392190> in <module>()

1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12",

----> 2 "07/22/98", "12/12/12"]], dtype="M8")

TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind'


In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12",

    ...: "07/22/98", "12/12/12"]], dtype="M8[us]")

Out[21]:

array(['2012-02-02T16:00:00.000000-0800',

'1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'], dtype='datetime64[us]')


In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12",

    ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]")

Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'], dtype='datetime64[D]')


The only barrier I can think of are those who have already built code around a object dtype array of datetime objects.

Thoughts?
Ben Root

P.S. - what ever happened to arange() and linspace() for datetime64?

arange definitely works:

In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8')
Out[28]: 
array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05',
       '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09',
       '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13',
       '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17',
       '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21',
       '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25',
       '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29',
       '2011-03-30', '2011-03-31'], dtype='datetime64[D]')

I didn't get to implementing linspace. I did look at it, but the current code didn't make it a trivial thing to put in.

-Mark
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Benjamin Root

unread,
Feb 15, 2012, 12:37:26 AM2/15/12
to Discussion of Numerical Python


On Tuesday, February 14, 2012, Mark Wiebe <mww...@gmail.com> wrote:
> On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root <ben....@ou.edu> wrote:
>>
>> Just a thought I had.  Right now, I can pass a list of python ints or floats into np.array() and get a numpy array with a sensible dtype.  Is there any reason why we can't do the same for python's datetime?  Right now, it is very easy for me to make a list comprehension of datetime objects using strptime(), but it is very awkward to make a numpy array out of it.
>
> I would consider this a bug, it's not behaving sensibly at present. Here's what it does for me:
>
> In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12",
>
>     ...: "07/22/98", "12/12/12"]], dtype="M8")

Well, I guess it would be nice if I didn't even have to provide the dtype (I.e., inferred from the datetime type, since we aren't talking about strings).  But I hadn't noticed the above, I was just making object arrays.
Sorry, I wasn't clear about arange, I meant that it would be nice if it could take python datetimes as arguments (and timedelat for the step?) because that is much more intuitive than remembering the exact dtype code and string format.

I see it as the numpy datetime64 type could take three types for it's constructor: another datetime64, python datetime, and The standard unambiguous datetime string.  I should be able to use these interchangeably in numpy.  The same would be true for timedelta64.

Easy interchange between python datetime and datetime64 would allow numpy to piggy-back on established functionality in the python system libraries, allowing for focus to be given to extended features.

Mark Wiebe

unread,
Feb 15, 2012, 12:54:29 AM2/15/12
to Discussion of Numerical Python
Ben Walsh actually implemented this and the code is in a pull request here:


This didn't go in, because the datetime properties don't exist on the arrays after you convert them to datetime64, so there could be some unintuitive consequences from that. When Martin implemented the quaternion dtype, we discussed the possibility that dtypes could expose properties that show up on the array object, and if this were implemented I think the conversion and compatibility between python datetime and datetime64 could be made quite natural.

Benjamin Root

unread,
Feb 15, 2012, 9:29:28 AM2/15/12
to Discussion of Numerical Python
> Ben Walsh actually implemented this and the code is in a pull request here:
> https://github.com/numpy/numpy/pull/111
> This didn't go in, because the datetime properties don't exist on the arrays after you convert them to datetime64, so there could be some unintuitive consequences from that. When Martin implemented the quaternion dtype, we discussed the possibility that dtypes could expose properties that show up on the array object, and if this were implemented I think the conversion and compatibility between python datetime and datetime64 could be made quite natural.
> -Mark
>  

Actually, at first glance, I don't see why this shouldn't go ahead as-is.  If I know I am getting datetime64, then I should expect to lose the features of the datetime object, right.  Sure, it would be nice if it kept those attributes, but keeping them would provide an inconsistent interface in the case of a numpy array created from datetime objects and one created from datetime64 objects (unless I misunderstood)

I will read through the pull request more closely and comment further.

Ben Root

Benjamin Root

unread,
Feb 15, 2012, 11:36:16 AM2/15/12
to Discussion of Numerical Python
Ok, I did some more testing between the master branch and the pull request.  I suspect that something is interfering with the type conversion because walshb's branch pulled on top of the current master yields the same results as for the current master (see next).

If passed a datetime, date, time or timedelta object ""without specifying the dtype"", you will get object arrays, which will, of course allow one to access attributes such as .year, .month, etc.

>>> np.array([date(2000, 1, 1)])
array([2000-01-01], dtype=object)

If passed a date object with dtype='M8', or a timedelta object with dtype='m8', you will get a datetime64 (or timedelta64):

>>> np.array([date(2000, 1, 1)], dtype='M8')
array(['2000-01-01'], dtype='datetime64[D]')

>>> np.array([timedelta(0, 0, 0)], dtype='m8')
array([0], dtype='timedelta64[us]')

The exception noted before only happens when a datetime object is passed in.  As an additional note, a time object passed in with dtype 'M8' will throw a ValueError because of the decision not to support times that are without dates.  Personally, I wonder if this should instead be treated like a timedelta64 object, but I haven't thought through the consequences of that yet.

I should also note a slight difference between the results from master and from v1.6.1.  In v1.6.1, creating an array with datetime objects and dtype='M8' works:

>>> np.array([datetime(2000, 1, 1)], dtype='M8')
array([2000-01-01 00:00:00], dtype=datetime64[us])

and for passing in a date object, the dtype is named something slightly different (and the string repr is different):

>>>  np.array([date(2000, 1, 1)], dtype='M8')
array([2000-01-01 00:00:00], dtype=datetime64[us])

The above has a dtype of 'datetime64[us]' instead of the current 'datetime64[D]', and it displays the time part, which is not currently done (but that is likely due to the '[D]' part of the datetime).

So, where does that leave us?  Well, I do agree that there is likely a problem with possible existing code that expects to create an object array.  Maybe an implicit conversion should be held off until version 2.0?  Until then, I would be happy with better documentation of the current abilities.  The datetime64 page currently only shows how to make a datetime64 array using strings, implying that that is the only method.  Maybe the top of that page should have a section showing how to create a datetime64 (and timedelta64) array using both string and datetime (timedelta) data sources.  It should also mention the need for providing the dtype (and possibly noting that future releases may not have that requirement?).

Cheers!
Ben Root

P.S. - the need for linspace has come up for me multiple times.  I might try putting something together.

Reply all
Reply to author
Forward
0 new messages