about 'read_csv()' with \x00 contains in the file

1,415 views
Skip to first unread message

Zongyuan Gu

unread,
May 15, 2012, 3:10:36 AM5/15/12
to PyData
Hi everyone,
I've got a csv file with '\x00' and try to use read_csv() to rad this
file and it failed, got following message:
----------------------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas-0.8.0.dev-py2.7-win32.egg
\pandas\io\parsers.py", line 204, in read_csv
return _read(TextParser, filepath_or_buffer, kwds)
File "C:\Python27\lib\site-packages\pandas-0.8.0.dev-py2.7-win32.egg
\pandas\io\parsers.py", line 174, in _read
return parser.get_chunk()
File "C:\Python27\lib\site-packages\pandas-0.8.0.dev-py2.7-win32.egg
\pandas\io\parsers.py", line 651, in get_chunk
content = self._get_lines(rows)
File "C:\Python27\lib\site-packages\pandas-0.8.0.dev-py2.7-win32.egg
\pandas\io\parsers.py", line 807, in _get_lines
new_rows.append(next(source))
File "C:\Python27\lib\site-packages\pandas-0.8.0.dev-py2.7-win32.egg
\pandas\core\common.py", line 837, in next
row = self.reader.next()
Error: line contains NULL byte

--------------------------------------------------------------------------------
seems there is way to solve this with the following code:
fi = open('my.csv', 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()
----------------------------------------------------------------------------------
But i will say is there any way else to solve the '\x00' problem?

Chih-Cheng Liang

unread,
Dec 28, 2015, 12:52:41 PM12/28/15
to PyData, guzon...@gmail.com
try use the latest feature of pandas to deal with utf_16_le

import pandas as pd
pd.read_csv('my.csv',delimiter=",", encoding='utf_16_le', engine="python")

Zongyuan Gu於 2012年5月15日星期二 UTC+8下午3時10分36秒寫道:
Reply all
Reply to author
Forward
0 new messages