hi Eric,
have you tried out the parsing functions in pandas?
http://pandas.sourceforge.net/io.html
They take a chunksize parameter so you can process the file piece by
piece if you wish. Depends on what you mean by "custom parsing"-- if
you have a set of strings to be recognized as NA, you can pass those
to the parser. If read_csv or read_table can't handle your data, I'd
be interested to know exactly why so I can improve them.
Performance should be a lot better than np.genfromtxt, about 3-4x
better in many cases.
best,
Wes
Wouldn't you rather just apply post-processing to the DataFrame? For
example, you could easily do:
mapping = {'HELLO' : 0, 'BONJOUR' : 1}
df[col] = df[col].map(mapping)
Or mapping could be any functions.
I do see the use case for a converters argument-- should be
straightforward to add:
https://github.com/wesm/pandas/issues/343
- Wes