read_csv() and date_parser

459 views
Skip to first unread message

Will Furnass

unread,
Mar 12, 2012, 9:55:15 AM3/12/12
to pystatsmodels
I'm having problems with read_csv() and it's date_parser parameter:

In [7]: nwl = pandas.read_csv("NWL-Rainton-Q-
T-2010-01-31-2011-11-03.csv", names=['time', 'Q', 'NTU'], index_col=0,
skiprows=[0], parse_dates=True, na_values=['NA'])

In [8]: nwl[32:38]
Out[8]:
Q NTU
time
2010-01-31 23:30:00 16.61 0.1
2010-01-31 23:45:00 16.46 0.1
2010-01-02 00:00:00 16.60 0.1
2010-01-02 00:15:00 16.48 0.1
2010-01-02 00:30:00 16.54 0.1
2010-01-02 00:45:00 16.42 NaN

First date above has been interpreted as day first, second as month
first.

Tried using custom date parser:

In [9]: nwl = pandas.read_csv("NWL-Rainton-Q-
T-2010-01-31-2011-11-03.csv", names=['time', 'Q', 'NTU'], index_col=0,
skiprows=[0], parse_dates=True, date_parser = lambda d:
dateutil.parser.parse(d, day_first=True), na_values=['NA'])

In [12]: nwl
Out[12]:
<class 'pandas.core.frame.DataFrame'>
Index: 61507 entries, 31/01/2010 15:30 to 03/11/2011 09:15
Data columns:
Q 61239 non-null values
NTU 56930 non-null values
dtypes: float64(2)

but then found that something had gone wrong somewhere that prevents
me from either plotting all Series within a Dataframe or individual
Series:

In [13]: nwl.plot()
...
/usr/local/lib/python2.7/dist-packages/matplotlib/lines.pyc in
recache(self, always)
424 x = ma.asarray(xconv, float)
425 else:
--> 426 x = np.asarray(xconv, float)
427 x = x.ravel()
428 else:

/usr/lib/pymodules/python2.7/numpy/core/numeric.pyc in asarray(a,
dtype, order)
282
283 """
--> 284 return array(a, dtype, copy=False, order=order)
285
286 def asanyarray(a, dtype=None, order=None):

ValueError: setting an array element with a sequence.

Any thoughts?

Interestingly standard plt.plot fails with the same error:

In [32]: plt.plot(nwl.Q.index.values, nwl.Q.values)

Cheers,

Will

Will Furnass

unread,
Mar 12, 2012, 12:23:24 PM3/12/12
to pystatsmodels


On Mar 12, 1:55 pm, Will Furnass <willfurn...@gmail.com> wrote:
> I'm having problems with read_csv() and it's date_parser parameter:
...
> In [9]: nwl = pandas.read_csv("NWL-Rainton-Q-
> T-2010-01-31-2011-11-03.csv", names=['time', 'Q', 'NTU'], index_col=0,
> skiprows=[0], parse_dates=True, date_parser = lambda d:
> dateutil.parser.parse(d, day_first=True), na_values=['NA'])

Problem here seems to be that the custom date_parser function is not
called and the 'time' column values are not converted from strings
when creating the Dataframe's index ; pretty sure that this is the
case as dateutil.parser.parse should throws a TypeError if passed a
'day_first' kwarg as the param name should not contain an underscore.

Will

Wes McKinney

unread,
Mar 14, 2012, 9:25:48 PM3/14/12
to pystat...@googlegroups.com

Hi Will,

I agree that an exception should be raised if a date_parser is passed
and it fails. Made these changes here and added a test case similar to
the example you described:

https://github.com/pydata/pandas/commit/f32efa82746b2c5aa8583c22680bf3b489be2153

thanks,
Wes

Reply all
Reply to author
Forward
0 new messages