pandas read_csv options memory_map and low_memory

2,843 views
Skip to first unread message

Michael Aye

unread,
Nov 13, 2013, 10:29:39 PM11/13/13
to pystat...@googlegroups.com
Hi!

I can't find documentation for these options and how/when to use them?
Reason I ask is that I am unable to parse a 2.7 G csv file. I let it run for 15 minutes on a 96 GB machine, I got this warning after 2 or 3 minutes, but it never finished:

In [2]: df = pd.read_csv(fname, parse_dates=[1])
/u/paige/maye/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1100_g0c30665-py2.7-linux-x86_64.egg/pandas/io/parsers.py:1033: DtypeWarning: Columns (15,18,19) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)


Playing with low_memory=False and memory_map produces core dump crashes unfortunately, that's why I am wondering how to use them properly.

Cheers,
Michael
 

Jeff Reback

unread,
Nov 14, 2013, 8:59:33 AM11/14/13
to pystat...@googlegroups.com
trty reading chunk-by-chunk (then concat at the end), much more memory efficient
these options are not documented AFAIK, but are only pseudo-public in any event
http://pandas.pydata.org/pandas-docs/dev/io.html#iterating-through-files-chunk-by-chunk

see where it hangs....(and what the csv looks like at that point)

Wes McKinney

unread,
Nov 14, 2013, 1:24:50 PM11/14/13
to pystat...@googlegroups.com
Michael-- any chance I could have a look at this file offline?

- Wes

Tommy Guy

unread,
Nov 14, 2013, 7:45:55 PM11/14/13
to pystat...@googlegroups.com
Hi Michael,

That warning triggers when the parser arrives at different dtypes in different chunks of the file. For instance, it's probably thinking the columns are int until it sees a float. You have three options:
1) Ignore the warning (but be aware that your data has different types)
2) Explicitly specify dtype for those columns
3) After the file loads, set the type of those columns explicitly

Tommy

Michael Aye

unread,
Nov 21, 2013, 2:27:57 PM11/21/13
to pystat...@googlegroups.com
Weren't we doing bottom-posting here? I forgot, sorry.

Anywho, I got it running with chunks of 1e6, but it still takes 15-20 minutes, wonder if Wes could improve it even more? Not sure if that's realistic.

Michael

Wes McKinney

unread,
Nov 21, 2013, 3:25:24 PM11/21/13
to pystat...@googlegroups.com
I got the data file, thanks. I'll have a look-see when I can to see
why it's taking to long to parse.
Reply all
Reply to author
Forward
0 new messages