Reading an usual excel file gives me a file size warning

3,719 views
Skip to first unread message

Jan Kraus

unread,
Dec 20, 2011, 3:49:38 PM12/20/11
to python-excel
I am trying to read a file, which easily opens in excel and open
office. If i re-save from openoffice, the error disapears. Resaving to
any format in excel does not help.

WARNING *** file size (3358637) not 512 + multiple of sector size
(512)
Traceback (most recent call last):
File "X:\96-subversion\student\penicka\src\Harmonicke.py", line 5,
in <module>
wb = open_workbook('Data\\testEON-8.xls');
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 457, in
open_workbook
bk.get_sheets()
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1007, in
get_sheets
self.get_sheet(sheetno)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 998, in
get_sheet
sh.read(self)
File "C:\Python27\lib\site-packages\xlrd\sheet.py", line 615, in
read
rowx, colx, xf_index, d = local_unpack('<HHHd', data)
struct.error: unpack requires a string argument of length 14

PS: I am new to this group, is there a way to attach the respective
file somehow?

John Machin

unread,
Dec 20, 2011, 4:31:14 PM12/20/11
to python...@googlegroups.com
"Easily" does not mean "correctly" or "successfully".

Both the file-size warning and the subsequent error indicate that the file has been truncated or mangled somehow. What software created this file?

Send me a copy of the original file via private e-mail, and I'll have a look at it.

John Machin

unread,
Dec 26, 2011, 4:19:55 PM12/26/11
to python...@googlegroups.com
Thanks for sending a sample file off-line. For the benefit of others reading this: Background is that the file was created using in-house software developed by another section of the company that Jan works for.

The error is caused by xlrd being strict about data layout unless and until I'm given reasonable grounds for it being otherwise. The problem here is that the NUMBER record (which is about the highest-frequency record type, and one of minimal complexity) is expected to be exactly 14 bytes, comprising a row index, a column index, an XF index,and an 8-byte IEEE 754 floating point number. That's it. 3 * 2 + 8 == 14. However this file has an extra 4 bytes of binary zeroes pointlessly tacked on the end of each NUMBER record.

You can work around this by applying the same "fix" that I'm about to make: change the offending line of code to read

rowx, colx, xf_index, d = local_unpack('<HHHd', data[:14]])

You may wish to tell the authors of the software that their output files are larger than need be (e.g. 3.3 Mb instead of 2.7 Mb).

You wrote: "Resaving to any format in excel does not help" -- however I found that using Excel 2003 or Excel 2007 to save it as an XLS file was OK.

The warning about the file size is quiet valid. However what would otherwise cause a train-smash is averted by the data blocks in the OLE2 structure being written last (i.e. after all the index blocks, an unusual but valid option), and xlrd doesn't attempt to access the "missing" padding at the end of the last data block.

Cheers,
John
Reply all
Reply to author
Forward
0 new messages