read_csv creating datetime index

1,019 views
Skip to first unread message

Martin De Kauwe

unread,
Jul 12, 2012, 1:01:13 AM7/12/12
to pyd...@googlegroups.com
I want to merge two columns (the year and day of year) and create a datetime index for my dataframe, but I can't seem to work out how to do it

my data (csv file) looks like...

YEAR, DOY, a
2001, 1, 10.
2001, 2, 11.
2001, 3, 67.

So I want to create the dataframe that looks something like

index YEAR DOY a
2001-01-01 00:00:00  2001 1 10.
2001-01-02 00:00:00  2001 2 11.
2001-01-02 00:00:00  2001 2 67.

import pandas 
import datetime as dt
from cStringIO import StringIO

def date_converter(x):
    print x
    #return dt.datetime.strptime(str(2001) + ' ' + str(1), '%Y %j')
  
data = "YEAR, DOY, a\n2001, 1, 10.\n2001, 2, 11.\n2001, 3, 67."
df = pandas.read_csv(StringIO(data), sep=",", parse_dates=True, 
                                index_col=[0,1], date_parser=date_converter)

However x is only the first part, i.e. the year and I can't seem to get both the year and doy. 

thanks.

Chang She

unread,
Jul 12, 2012, 7:30:20 AM7/12/12
to pyd...@googlegroups.com
You don't actually need to custom converter:

In [9]: df = pandas.read_csv(StringIO(data), 
                                          parse_dates=[[0, 1]], 
                                          index_col=0, 
                                          keep_date_col=True)

In [10]: df
Out[10]: 
            YEAR   DOY   a
YEAR_ DOY                 
2001-01-12  2001     1  10
2001-02-12  2001     2  11
2001-03-12  2001     3  67



"parse_dates=[[0, 1]]" creates a new column and prepends it to the data, then "index_col=0" uses it as the index. By default YEAR and DOY will be discarded unless "keep_date_col=True" is passed.

I made a github issue to add more documentation for this (https://github.com/pydata/pandas/issues/1612). If you'd like, you're welcome to add additional comments.

--
Chang She

Martin De Kauwe

unread,
Jul 12, 2012, 7:36:57 AM7/12/12
to pyd...@googlegroups.com
thanks. Is this feature only in a very new version of pandas?

I seem to have...

pandas.version
<module 'pandas.version' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pandas-0.7.2.dev_81dcf10-py2.6-macosx-10.6-x86_64.egg/pandas/version.pyc'>

however when I run your example...

TypeError: read_csv() got an unexpected keyword argument 'keep_date_col'

Chang She

unread,
Jul 12, 2012, 7:37:10 AM7/12/12
to pyd...@googlegroups.com
Actually you do need the custom parser

In [24]: def date_converter(x):
   ....:     return dt.datetime.strptime(x, '%Y %j')
   ....: 

In [25]: df = pandas.read_csv(StringIO(data), parse_dates=[[0, 1]], index_col=0, keep_date_col=True, date_parser=date_converter)

In [26]: df
Out[26]: 
            YEAR   DOY   a
YEAR_ DOY                 
2001-01-01  2001     1  10
2001-01-02  2001     2  11
2001-01-03  2001     3  67

Sorry to mislead

Chang She

unread,
Jul 12, 2012, 7:38:00 AM7/12/12
to pyd...@googlegroups.com
Yeah, this is for pandas 0.8
Reply all
Reply to author
Forward
0 new messages