using file_contents with open_workbook() requires binary open() on Windows

542 views
Skip to first unread message

Randy Syring

unread,
Nov 23, 2010, 3:18:36 PM11/23/10
to python-excel
I found a previous post that described my problem:

http://groups.google.com/group/python-excel/browse_thread/thread/46ded074f9fe7523/0504522c8e6174f0

The traceback was:

XLRDError: Unsupported format, or corrupt file: Expected BOF record;
found '\xd\xcf\x11\xe0\xa1\xb1'

But it worked fine on linux. Based on the post above, I was able to
figure out that I needed to open the file in binary mode before
passing the result to file_contents:

data = open(path.join(excel_path, 'employee_import_template_01.xls'),
'rb').read()
book = xlrd.open_workbook(file_contents=data)

Just wanted to post in-case someone else might benefit from that info.

John Machin

unread,
Nov 23, 2010, 5:02:21 PM11/23/10
to python...@googlegroups.com

Hi Randy,

I'm glad you were able to recover from that. Note that in text mode,
len(data) would be 6; the 7th byte of the 8-byte signature at the start
of the file is 0x1A which is interpteted as EOF by Windows in text mode.

It's always a good idea to open binary files (like .XLS and .CSV)
explicitly with mode='rb' irrespective of what operating system you
first run your code on.

Is there any particular reason why you open the file, read the contents,
and then pass the contents to open_workbook? If you are not inspecting
the contents yourself (virus checking??), then you are doing too much
work, and stopping open_workbook from taking advantage of memory-mapping
the file ... just pass the path to the file as the first arg. The
file_contents arg is for use e.g. on the web where the receiver gets a
bytestring and doesn't want/need to write it to disk.

Cheers,
John

Randy Syring

unread,
Nov 24, 2010, 5:24:51 PM11/24/10
to python-excel
On Nov 23, 5:02 pm, John Machin <sjmac...@lexicon.net> wrote:
> Is there any particular reason why you open the file, read the contents,
> and then pass the contents to open_workbook? If you are not inspecting
> the contents yourself (virus checking??), then you are doing too much
> work, and stopping open_workbook from taking advantage of memory-mapping
> the file ... just pass the path to the file as the first arg. The
> file_contents arg is for use e.g. on the web where the receiver gets a
> bytestring and doesn't want/need to write it to disk.

John,

Thanks for your reply. Actually, you hit the nail on the head. I am
using open_workbook() in a web application where I get a file like
object that does not have a path on the file system. So sending the
contents is easier for me. I was having problems during tesing b/c in
order to unit-test my function, I had to pass in file contents. To
get the file contents, I was opening the file myself and reading the
contents and thats where I got the issue with not opening it in binary
mode. I will use 'rb' by default in the future.
Reply all
Reply to author
Forward
0 new messages