pandas.to_csv() date format bug?

1,604 views
Skip to first unread message

Pierre Demartines

unread,
Nov 2, 2015, 3:34:13 PM11/2/15
to PyData
Hi,

I'd like to report a possible bug in pandas.to_csv() handling of columns containing Timestamps.
- if all timestamps are normalized (truncated to midnight), then the default format for the column is to drop 'HH:MM:SS' (not a bug, kind of cool in fact, but a bit inconsistent).
- if the timestamps are localized to a specific timezone (in the example below, UTC), then the date_format is completely ignored (BUG!)

Best regards,

~Pierre
PS: first post --please let me know if there is a better way to report a suspected bug.


```python
>>> import pandas as pd
>>> import StringIO
>>>
>>> # quick read from String as csv
... str = """
... id,t,x
... a,2015-10-27 00:27:10,3.2
... b,2015-10-28 00:00:00,4
... """
>>> df = pd.read_csv(StringIO.StringIO(str), parse_dates=[1])
>>>
>>> df
  id                   t    x
0  a 2015-10-27 00:27:10  3.2
1  b 2015-10-28 00:00:00  4.0
>>>
>>> # save to csv in memory
... x = StringIO.StringIO()
>>> df.to_csv(x, header=True, index=False, sep='\t')
>>> print(x.getvalue())
id      t       x
a       2015-10-27 00:27:10     3.2
b       2015-10-28 00:00:00     4.0

>>>
>>> # now normalize (all timestamps to time 00:00:00)
... df.t = df.t.apply(lambda t: t.normalize())
... # or (same result in this case): df.loc[0,'t'] = df.loc[0,'t'].normalize()
>>>
>>> # the implicit 00:00:00 is dropped in all rows :-/
... x = StringIO.StringIO()
>>> df.to_csv(x, header=True, index=False, sep='\t')
>>> print(x.getvalue())
id      t       x
a       2015-10-27      3.2
b       2015-10-28      4.0

>>>
>>> # but not if we specify the format (good)
... x = StringIO.StringIO()
>>> df.to_csv(x, header=True, index=False, sep='\t', date_format='%Y-%m-%d %H:%M:%S')
>>> print(x.getvalue())
id      t       x
a       2015-10-27 00:00:00     3.2
b       2015-10-28 00:00:00     4.0

>>>
>>> # but now, pandas.to_csv() ignores the format if the timestamps are localized
... df.t = df.t.apply(lambda t: t.tz_localize('utc'))
>>> x = StringIO.StringIO()
>>> df.to_csv(x, header=True, index=False, sep='\t', date_format='%d,%m/%Y')
>>> print(x.getvalue())
id      t       x
a       2015-10-27 00:00:00+00:00       3.2
b       2015-10-28 00:00:00+00:00       4.0
>>>
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-63-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.2
nose: 1.3.7
Cython: 0.23.3
numpy: 1.9.2
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
```

Stephan Hoyer

unread,
Nov 2, 2015, 4:06:36 PM11/2/15
to pyd...@googlegroups.com
Hi Pierre,

We usually handle bug reports through GitHub issues -- it makes it easier to have a back and forth without spamming everyone on this list. So it would be great if you could repost there:

Cheers,
Stephan

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages