pandas.Period strftime behavior: linux vs. windows

71 views
Skip to first unread message

Nathan Wendt

unread,
Aug 4, 2016, 11:07:13 AM8/4/16
to PyData
Hello,

I have a case where output from a numerical model has year "0001" in the timestamp. Using datetime.datetime to handle this does not work. pandas.Period can take such a year as input without an issue. What I have noticed, however, is that when trying to output that year back out with the strftime method, I get different results depending on the OS. To be clear, I have made sure that the same version of pandas was being used in all cases. The only difference is the OS. On windows I get what I expect:

In [1]: pandas.Period("0001/01/01").strftime("%Y")
Out [1]: u'0001'

On most Linux I get (e.g., on CentOS 6.8, CentOS Linux release 7.1.1503 and RHEL Workstation release 6.8):

In [2]: pandas.Period("0001/01/01").strftime("%Y")
Out [2]: u'1'

but sometimes (e.g., on RHEL Server release 6.8):

In [3]: pandas.Period("0001/01/01").strftime("%Y")
Out [3]: u'2001'

From what I understand about strftime, the results can be system dependent based on the C libraries that are used. I wanted to be sure this was intended/expected behavior before moving forward. Certainly, I can come up with a way to make the output more portable for my purposes.

Thanks,

Skip Montanaro

unread,
Aug 5, 2016, 9:19:16 AM8/5/16
to PyData
I would be careful using 1/1/1 as your input. You can't tell which field pandas.Period thinks is the year. For example:

>>> p = pandas.Period("0001-02-03")
>>> p.strftime("%Y-%m-%d")
u'2003-01-02'

That was not at all what I expected, given that I was using dashes as separators. I would have thought it would interpret the string as an ISO-8601-formatted date.

Nathan Wendt

unread,
Aug 7, 2016, 7:59:15 PM8/7/16
to PyData
That does make sense for that case. However, with 1/1/1 you would still expect the year to be returned with 4 digits even if it were confused on the true location of %Y in the string. That the amount of digits returned is not always 4 for %Y is what I was curious about.

Skip Montanaro

unread,
Aug 8, 2016, 9:47:44 AM8/8/16
to PyData
On Sunday, August 7, 2016 at 6:59:15 PM UTC-5, Nathan Wendt wrote:
That the amount of digits returned is not always 4 for %Y is what I was curious about.

Yeah, that does seem weird. You can coax a three-digit year out of it as well:

>>> pd.Period("01/01/101").strftime("%Y")
u'101'

I would have expected "0101".

Given that Python's time.strftime() function won't accept a year < 1900, it seems clear that pandas.Period relies on its own implementation of strftime(). My guess is it's buggy, though the documentation says:

    | ``%Y``    | Year with century as a decimal |       |
    |           | number.                        |       |

so there is no guarantee that it will be four digits.

After fussing about a bit, I went ahead and opened an issue on Github: https://github.com/pydata/pandas/issues/13931

The worst they can do is close as "won't fix". Perhaps it will at least provoke a bit of discussion or documentation changes. In the meantime, I think your best bet is to experiment with input formats, find one which works as you expect, and format your inputs rigorously to avoid ambiguity. Given the head scratching I did while messing with hyphen-separated dates, it looks like fully padded dates in either of these formats are your best bet for now:

>>> pd.Period("01/02/2003")
Period('2003-01-02', 'D')
>>> pd.Period("2003-01-02")
Period('2003-01-02', 'D')

Skip
Reply all
Reply to author
Forward
0 new messages