Pandas read_csv dtype=str fails on empty csv file

1,279 views
Skip to first unread message

matthew....@performgroup.com

unread,
Mar 10, 2016, 7:30:57 AM3/10/16
to PyData
Hi,

I have a csv file that is generated daily. It contains a combination of fields that, for arguments sake, I want to treat as strings. I do this like so:

df = pd.read_csv(f, dtype=str)

This works great when the csv file has rows. However, when the daily csv file is empty (which can happen) Pandas throws an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 285, in _read
    return parser.read()
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in read
    ret = self._engine.read(nrows)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1204, in read
    dtype=self.kwds.get('dtype'))
  File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 2259, in _get_empty_meta
    for k, v in compat.iteritems(dtype))
  File "/usr/local/lib/python2.7/site-packages/pandas/compat/__init__.py", line 134, in iteritems
    func = obj.items
AttributeError: type object 'str' has no attribute 'items'

You can easily replicate this:

import pandas as pd
from StringIO import StringIO

csv = StringIO('col1,col2,col3')
df = pd.read_csv(csv, dtype=str)

I notice that if I pass a dict into dtype it actually works fine:

import pandas as pd
from StringIO import StringIO

csv = StringIO('col1,col2,col3')
df = pd.read_csv(csv, dtype={'col1': str})

Is this a known issue? Or is it something silly I'm doing?

Any thoughts would be appreciated as it would be much simpler for me to treat all columns as strings, rather than identifying specific ones.

Thanks.

Just in case this is of use:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 18.7.1
Cython: None
numpy: 1.10.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: 1.4.2
sqlalchemy: 1.0.9
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None


Reply all
Reply to author
Forward
0 new messages