Poll: Is Timestamp span limitation enough for you (584 years)?

324 views
Skip to first unread message

Michael

unread,
May 24, 2016, 9:49:29 AM5/24/16
to PyData
All details about Timestamp span issue:
https://github.com/pydata/pandas/issues/7307


There's a need for a less detailed Timestamp, to have a much greater span of time
because Timestamp unit of time is nanoseconds => time span is 584 years

But we don't know how big is this need, hence this poll.

Jeff suggested to try to gather some information here, so here it is.

Basically, possible spans are these:

s second +/- 2.9e12 years [ 2.9e9 BC, 2.9e9 AD]
ms millisecond +/- 2.9e9 years [ 2.9e6 BC, 2.9e6 AD]
us microsecond +/- 2.9e6 years [290301 BC, 294241 AD]
ns nanosecond +/- 292 years [ 1678 AD, 2262 AD]  (current span, and the only span)


It would be great if users would add their experience with Timestamp to this thread.

If Timestamp span is enough for you, please reply with Yes (even if you don't care about it)
Otherwise, it would be great if you respond to these questions, besides the obvious No.

1. what span do you prefer (or base frequency): s, ms, us, ns?
2. what is the typical usercase? (an example or pseudocode)
3. what are the current word-arounds to overcome Timestamp limitation?
4. do you change Timestamps after getting data into Dataframe? because the main issue, apparently, is dealing with casting. for example, say you have data in with M8[ms] THEN add in data at a lower frequency
(see a more thorough example in Jeff's response: https://github.com/pydata/pandas/issues/7307#issuecomment-220322313 )


Michael

unread,
May 24, 2016, 9:53:37 AM5/24/16
to PyData
Big No from me.
584 years is no where near enough.

1. us
2. usually, I generate an index with Timestamps, then calculate ephemeris and other astronomical stuff
3. i don't have a workaround, except limiting the study to 584 years
4. I never change Timestamps after getting or generating them

Goyo

unread,
May 25, 2016, 3:11:20 AM5/25/16
to PyData
Yes, it's been enough for me so far.

Pietro Battiston

unread,
May 25, 2016, 4:30:19 AM5/25/16
to pyd...@googlegroups.com
Yes, for me too.
Though I see it far more probable to need sooner or later dates after
2262 than to need nanoseconds.
I would tend to think that once we add support for two different
resolutions, we want to cover both extremes and hence complement
nanoseconds with seconds - the standard measure of time, and one which
newcomers migtht expect more than microseconds or nanoseconds. But
then, practicality beats purity... and I don't have any practical issue
with neither of the possibilities.

Pietro
> --
> You received this message because you are subscribed to the Google
> Groups "PyData" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to pydata+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Spencer Hill

unread,
May 26, 2016, 12:03:18 AM5/26/16
to PyData
1. For climate data, s is more than enough, but even going from ns to us would basically resolve everything
2. Analyzing output of climate model simulations, some of which start at year 1 or 0, others of which span several thousand years and thus beyond year 2262
3. (all of this is through xarray) we check for data with years outside the [1678, 2262] range, manually overwrite its time array by adding/subtracting some offset if it is outside this range, perform our calculations, and then remove that offset (this does not work for durations that exceed both limits)
4. n/a with our current xarray workflow

Also, maybe I'm missing something obvious, but aren't the listed +/- values and [min, max] spans inconsistent for s, ms, and us?

Thanks!

Best,
Spencer

l736x

unread,
May 26, 2016, 9:03:44 AM5/26/16
to PyData
No for me.

1. microseconds would be enough.
2. I load timeseries from external sources (typically sql or hdf5 proprietary format) that sometimes exceed the 2262 boundary.
3. Workaround: often I have a fixed '3000-01-01' date in sql; in this case I can replace it by a date before 2262 directly in the query. Otherwise I filter out the data beyond 2262.
4. I never manipulate the Timestamps.

Ritvik Sahajpal

unread,
May 27, 2016, 11:56:37 AM5/27/16
to PyData
Thanks for this thread, really looking forward to an expanded time-domain. Re your specific questions:

1. span I prefer: microsecond or higher (millisecond, second)
2. Typical use case involves dealing with climate model data which can often start around 1000 AD
3. Currently, I cannot use pandas functionality in this domain

Ryan Abernathey

unread,
May 27, 2016, 11:56:37 AM5/27/16
to PyData
Definite NO.

1. what span do you prefer (or base frequency): s, ms, us, ns?

ms would suffice for me

2. what is the typical usercase? (an example or pseudocode)

Analyzing oceanographic and climate timeseries

3. what are the current word-arounds to overcome Timestamp limitation?

not use Pandas for long timseries

4. do you change Timestamps after getting data into Dataframe? because the main issue, apparently, is dealing with casting. for example, say you have data in with M8[ms] THEN add in data at a lower frequency 

I'm not sure I understand the question. I guess that means no?


On Tuesday, May 24, 2016 at 9:49:29 AM UTC-4, Michael wrote:

Fabien

unread,
May 27, 2016, 12:15:10 PM5/27/16
to pyd...@googlegroups.com
No.

1. everything else than ns
2. climate and paleoclimate data at century to millenial scale
3. work around: own time system with daily or monthly time-steps
4. na

Thanks,

Fabien


Brewster Malevich

unread,
May 27, 2016, 10:10:09 PM5/27/16
to PyData
1) NS doesn't cut it. US or smaller. ~2K BC is usually as far back as I can go, though.

2) Pseudocode... erm. I might use this for a timestamp index in pandas. I'm very interested in the downstream effect on the xarray module, so that we can use xarray with longer paleoclimate model runs.

3) Just put years as ints. I don't use Pandas for this kind of work specifically because of this limitation. I think in the modeling world, netcdf4-python has a work around with a specialized class... I don't know what there exact implementation is.

4) No. I've usually just not used Pandas or not used timestamps.


On Tuesday, May 24, 2016 at 6:49:29 AM UTC-7, Michael wrote:

Joe Hamman

unread,
May 29, 2016, 3:56:32 PM5/29/16
to PyData
1. Anything other than ns works.  us is fine but I don't anticipate using anything finer than s. 
2. Analyzing geoscientific datasets, usually model simulated datasets at regual timesteps.  Usually using xarray first, then moving to pandas in some use case.
3. Most of my model simulations fit within the current ns range however that doesn't always work.  I usually use netcdf4-python's datetime object for cases where pandas datestamps don't work.
4. sometimes I'll convert them to another object type but I usually don't typecast them.


On Tuesday, May 24, 2016 at 6:49:29 AM UTC-7, Michael wrote:

Carst Vaartjes

unread,
Jun 1, 2016, 7:08:41 AM6/1/16
to PyData
Hi!

1. what span do you prefer (or base frequency): s, ms, us, ns?
For me us would be good enough; in general we work with a lot of financial systems and those tend to use 9999-12-31 as a standard end date, breaking csv imports.

2. what is the typical usercase? (an example or pseudocode)
Analysis of data from financial systems

3. what are the current word-arounds to overcome Timestamp limitation?
Load it as string with read_csv, then apply a procedure re-casting 9999-12-31 to 2099-12-31 to work around it.

4. do you change Timestamps after getting data into Dataframe? because the main issue, apparently, is dealing with casting. for example, say you have data in with M8[ms] THEN add in data at a lower frequency 
We do timedelta stuff (adding days, etc)

BR

Carst

Michael

unread,
Jun 6, 2016, 6:45:19 PM6/6/16
to PyData
very nice answers, 8:2 in favor of change (nice!, so we are indeed frustrated with nanoseconds)
Would be great if more pandas users would vote in this poll though.

Paul Hobson

unread,
Jun 6, 2016, 7:05:09 PM6/6/16
to pyd...@googlegroups.com
Nanoseconds doesn't limit my work in anyway, but it is overkill. I'd be hard-pressed to find a reason to meaningfully use time steps smaller than 1 second.

(my field: water resources engineering, stormwater, hydrologic, and hydrodynamic modeling).
-p

On Mon, Jun 6, 2016 at 3:45 PM, 'Michael' via PyData <pyd...@googlegroups.com> wrote:
very nice answers, 8:2 in favor of change (nice!, so we are indeed frustrated with nanoseconds)
Would be great if more pandas users would vote in this poll though.

--

Jason Bandlow

unread,
Jun 6, 2016, 7:53:12 PM6/6/16
to pyd...@googlegroups.com
This actually just bit me, though in an easily fixable way.  I was working on some web-site usage data, and wanted to look at "total site usage time for people in cohort X" / "total site usage time".  Working with Timedeltas, I got an overflow -- the fix was just to work in seconds.


--
Jason Bandlow
Principal Data Scientist at workpop♥

Michael Aye

unread,
Jun 6, 2016, 8:46:57 PM6/6/16
to PyData
1. what span do you prefer (or base frequency): s, ms, us, ns?
microseconds

2. what is the typical usercase? (an example or pseudocode)
spacecraft data analysis for which in 95% of all cases microseconds is just right. And then with some planetary modeling, comparing measurements with models, I could easily breach the 584 years barrier.

3. what are the current word-arounds to overcome Timestamp limitation?
I just convert to milli or microsecond whenever I need it. Not too hard, but also display-wise I find the nanoseconds more of a nuisance, than a help. Just look at datetime formatting strings as well: microseconds exists (%f), nanoseconds does not. When I have to switch between string representation and datetime object (or Timestamp), the lack of nanosecond formatting and parsing ability is more of a hinderance than a help.

4. do you change Timestamps after getting data into Dataframe? because the main issue, apparently, is dealing with casting. for example, say you have data in with M8[ms] THEN add in data at a lower frequency 
See my answers in 3.

Hope that helps.
Michael (a different one)

Denis Akhiyarov

unread,
Jun 7, 2016, 9:29:42 AM6/7/16
to PyData
i rarely work with datasets larger than few years and more frequent than 1-minute interval

Tom Augspurger

unread,
Jun 12, 2016, 3:55:01 PM6/12/16
to PyData
For those who only follow the mailing list, Wes' perspective on this (the reasons for the original choice and the difficulty of changing it) is here.

John Mark Aiken

unread,
Jun 22, 2016, 8:58:36 AM6/22/16
to PyData
I would love to switch to a lower precision. In seismology we have historical data sets going back 1000+ years, and simulation data sets that go 10000+ years. ns precision kills pandas when trying to use timestamps at scale.

Naveen Michaud-Agrawal

unread,
Jul 5, 2016, 4:13:45 PM7/5/16
to PyData

Molecular dynamics simulations - unfortunately ns is way too big. I need something in the femto-second range ;)

(j/k - for my purposes the simulations can exist outside of historical time)

Naveen


On Tuesday, May 24, 2016 at 9:49:29 AM UTC-4, Michael wrote:

Juan Berrio

unread,
Jul 5, 2018, 10:37:44 AM7/5/18
to PyData
The date limitation in pandas is a massive roadblock for me, forecasting with environmental simulations that run for over 1000 years since present time.  

I believe many other scientists work with data sets that expand well over 1000, 10000 or 100000 years.  Think radiation decay of certain elements and dating of certain minerals, earth evolution, movement/routes of spatial bodies, etc.

Precision of seconds would be enough for me.  But an option covering at least 10,000,000,000 years should suffice most scientific applications.

Thanks.
Reply all
Reply to author
Forward
0 new messages