read_csv() and pre-filtering rows

46 views
Skip to first unread message

Seth P

unread,
Mar 6, 2017, 12:26:42 PM3/6/17
to PyData
I would like to use read_csv() to construct a DataFrame from only part of a very large csv file, filtering the data rows with a predicate function as in filter(). (For example, the csv may contain data rows of the form yyyy-mm-dd,first_name,last_name,score1,score2, and I would want to filter the rows with lambda row: row.startswith('2014').)  I've looked at read_csv(iterator=True), but that would still end up reading large chunks, and I'd need to filter and concatenate the results, which seems slower and more wasteful of memory than optimal. Is there a way to apply a filtering predicate to the input data before read_csv() reads it and converts it to a DataFrame?
Reply all
Reply to author
Forward
0 new messages