read_csv() and pre-filtering rows

46 views

Skip to first unread message

Seth P

unread,

Mar 6, 2017, 12:26:42 PM3/6/17

to PyData

I would like to use read_csv() to construct a DataFrame from only part of a very large csv file, filtering the data rows with a predicate function as in filter(). (For example, the csv may contain data rows of the form yyyy-mm-dd,first_name,last_name,score1,score2, and I would want to filter the rows with lambda row: row.startswith('2014').) I've looked at read_csv(iterator=True), but that would still end up reading large chunks, and I'd need to filter and concatenate the results, which seems slower and more wasteful of memory than optimal. Is there a way to apply a filtering predicate to the input data before read_csv() reads it and converts it to a DataFrame?

Reply all

Reply to author

Forward

0 new messages