read_csv creating datetime index

Martin De Kauwe

unread,

Jul 12, 2012, 1:01:13 AM7/12/12

to pyd...@googlegroups.com

I want to merge two columns (the year and day of year) and create a datetime index for my dataframe, but I can't seem to work out how to do it

my data (csv file) looks like...

YEAR, DOY, a

2001, 1, 10.

2001, 2, 11.

2001, 3, 67.

So I want to create the dataframe that looks something like

index YEAR DOY a

2001-01-01 00:00:00 2001 1 10.

2001-01-02 00:00:00 2001 2 11.

2001-01-02 00:00:00 2001 2 67.

import pandas

import datetime as dt

from cStringIO import StringIO

def date_converter(x):

print x

#return dt.datetime.strptime(str(2001) + ' ' + str(1), '%Y %j')

data = "YEAR, DOY, a\n2001, 1, 10.\n2001, 2, 11.\n2001, 3, 67."

df = pandas.read_csv(StringIO(data), sep=",", parse_dates=True,

index_col=[0,1], date_parser=date_converter)

However x is only the first part, i.e. the year and I can't seem to get both the year and doy.

thanks.

Chang She

unread,

Jul 12, 2012, 7:30:20 AM7/12/12

to pyd...@googlegroups.com

You don't actually need to custom converter:

In [9]: df = pandas.read_csv(StringIO(data),

parse_dates=[[0, 1]],

index_col=0,

keep_date_col=True)

In [10]: df

Out[10]:

YEAR DOY a

YEAR_ DOY

2001-01-12 2001 1 10

2001-02-12 2001 2 11

2001-03-12 2001 3 67

"parse_dates=[[0, 1]]" creates a new column and prepends it to the data, then "index_col=0" uses it as the index. By default YEAR and DOY will be discarded unless "keep_date_col=True" is passed.

I made a github issue to add more documentation for this (https://github.com/pydata/pandas/issues/1612). If you'd like, you're welcome to add additional comments.

--
Chang She

Lambda Foundry

Martin De Kauwe

unread,

Jul 12, 2012, 7:36:57 AM7/12/12

to pyd...@googlegroups.com

thanks. Is this feature only in a very new version of pandas?

I seem to have...

pandas.version

however when I run your example...

TypeError: read_csv() got an unexpected keyword argument 'keep_date_col'

Chang She

unread,

Jul 12, 2012, 7:37:10 AM7/12/12

to pyd...@googlegroups.com

Actually you do need the custom parser

In [24]: def date_converter(x):

....: return dt.datetime.strptime(x, '%Y %j')

....:

In [25]: df = pandas.read_csv(StringIO(data), parse_dates=[[0, 1]], index_col=0, keep_date_col=True, date_parser=date_converter)

In [26]: df

Out[26]:

YEAR DOY a

YEAR_ DOY

2001-01-01 2001 1 10

2001-01-02 2001 2 11

2001-01-03 2001 3 67

Sorry to mislead

Chang She

unread,

Jul 12, 2012, 7:38:00 AM7/12/12

to pyd...@googlegroups.com

Yeah, this is for pandas 0.8

Reply all

Reply to author

Forward