1 loops, best of 3: 393 ms per loop
when I put the date_parser into read_csv like so:
%timeit pd.io.parsers.read_csv('2013030100_RDR.TAB',nrows=1e4, skipinitialspace=True,names=rdrreader.headers,skiprows=1,na_values=['-9999.0'],parse_dates=[[0,1]],index_col=0,date_parser=parse)
and, interestingly, a little bit faster, when I do the parsing afterwards:
%%timeit
df = pd.io.parsers.read_csv('2013030100_RDR.TAB',nrows=1e4, skipinitialspace=True,names=rdrreader.headers,skiprows=1,na_values=['-9999.0'])
df['date_utc'] = df.date + ' ' + df.utc
df['time'] = df.date_utc.map(parse)
1 loops, best of 3: 370 ms per loop
This gets the read-time for a 5.5 MLines file with date_parse'ing down to 3.4 minutes, not bad, I would like to have it faster, obviously, but if that's the fastest it gets, I bite.
I would like to try it with cython, would be a good learning project, but I have no clue if it actually would improve? What do you guys think?