Hi Everyone,
pandas stores datetime information (pandas.Timestamp) in nanoseconds as a 64-bit integer which gives it a span of [-292, 292] years around 1970-01-01. It might be OK for some timeseries (e.g. stock market quotes) but is inadequate in many other circumstances.
If I want to list the William Shakespire's plays by the year of their creation I get a problem:
import pandas as pd
shakespeare_works= pd.DataFrame([
['1591', 'The Two Gentlemen of Verona'],
['1591', 'The Taming of the Shrew'],
['1591', 'Henry VI, Part 2'],
['1591', 'Henry VI, Part 3'],
['1592', 'Henry VI, Part 1'],
], columns=['Date','Title'])
dates= pd.to_datetime( shakespeare_works.Date ) # ==> Error:
# OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1591-01-01 00:00:00
I think it would make sense to change the pandas.Timestamp time unit from nanosecond to a microsecond. It would be consistent with python's own datetime and its span will cover the entire human history.
Alternatively, a more general approach would be to parametrize Timestamp type on a time unit similar to what numpy.datetime64 does.
--Leo