pandas.Timestamp span is too short for William Shakespeare!

911 views
Skip to first unread message

slonik

unread,
Jan 26, 2016, 9:07:41 AM1/26/16
to PyData
Hi Everyone,
pandas stores datetime information (pandas.Timestamp) in nanoseconds as a 64-bit integer which gives it a span of [-292, 292] years around 1970-01-01. It might be OK for some timeseries (e.g. stock market quotes)  but is inadequate in many other circumstances.

If I want to list the William Shakespire's plays by the year of their creation I get a problem:

import pandas as pd
shakespeare_works= pd.DataFrame([
    ['1591',    'The Two Gentlemen of Verona'],
    ['1591',    'The Taming of the Shrew'],
    ['1591',    'Henry VI, Part 2'],
    ['1591',    'Henry VI, Part 3'],
    ['1592',    'Henry VI, Part 1'],
    ], columns=['Date','Title'])

dates= pd.to_datetime( shakespeare_works.Date ) # ==> Error:

# OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1591-01-01 00:00:00

I think it would make sense to change the pandas.Timestamp time unit from nanosecond to a microsecond. It would be consistent with python's own datetime and its span will cover the entire human history.

Alternatively, a more general approach would be to parametrize Timestamp type on a time unit similar to what numpy.datetime64 does.

--Leo

Jeff

unread,
Jan 26, 2016, 9:22:00 AM1/26/16
to PyData
see docs & suggestions here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-oob

This ns range repr was decided quite a long time ago and is a pretty good compromise from a practical perspective.

Breaking this now is basically impossible. However there is a plan to allow multiple frequencies specifications for datetimes 
(similar to numpy), see a related issue https://github.com/pydata/pandas/issues/6741

Stephan Hoyer

unread,
Jan 26, 2016, 2:24:08 PM1/26/16
to pyd...@googlegroups.com
I agree that microsecond precision datetimes would have made more sense if we were starting from scratch.

Unfortunately, we do have some specialized code for nanosecond precision, and generalizing it to work with a different precision would take some work. Also, transitioning from nanosecond precision would entail a compatibility break, so adding microsecond precision would need to be done in a backwards compatible way.

Basically, this is just a lot of work but pretty straightforward. If you care about this, contributions would be very welcome!

Stephan


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages