AssertionError when opening .xls file

1,029 views
Skip to first unread message

frank h.

unread,
Nov 3, 2008, 9:02:32 AM11/3/08
to python-excel
Hello,
I'm on xlrd trunk (0.7.0a7) trying to open an Excel file like this:

>>> xlrd.open_workbook(XLFILE)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.5/site-packages/xlrd/__init__.py", line 377,
in open_workbook
formatting_info=formatting_info,
File "/Library/Python/2.5/site-packages/xlrd/__init__.py", line 810,
in biff2_8_load
cd = compdoc.CompDoc(filestr, logfile=self.logfile)
File "/Library/Python/2.5/site-packages/xlrd/compdoc.py", line 194,
in __init__
sscs_dir.tot_size, name="SSCS")
File "/Library/Python/2.5/site-packages/xlrd/compdoc.py", line 250,
in _get_stream
assert s == EOCSID
AssertionError

not sure what is wrong, any insight on that?
thanks,
-frank

John Machin

unread,
Nov 3, 2008, 3:59:30 PM11/3/08
to python...@googlegroups.com
On 4/11/2008 01:02, frank h. wrote:
> Hello,
> I'm on xlrd trunk (0.7.0a7) trying to open an Excel file like this:
>
>>>> xlrd.open_workbook(XLFILE)
> Traceback (most recent call last):
[snip]

> File "/Library/Python/2.5/site-packages/xlrd/compdoc.py", line 194,
> in __init__
> sscs_dir.tot_size, name="SSCS")
> File "/Library/Python/2.5/site-packages/xlrd/compdoc.py", line 250,
> in _get_stream
> assert s == EOCSID
> AssertionError
>
> not sure what is wrong, any insight on that?

Hi Frank,

Either my understanding of what a valid OLE2 Compound Document file
should look like is wrong in a corner case, or your file is corrupt, or
somewhere in the middle (your file is idiosyncratic but not
irredeemable, so xlrd should emit a warning and keep going).

Please answer the following questions:
* What platform?
* Python 2.5.x ... what is x?
* What software created the file?
* Is the file one of many from the same source?
* Do you have this problem with all files from that source, or only this
one file?
* Before raising AssertionError, did xlrd output any warning messages?
* What is the outcome when you try to open this file with Excel?
OpenOffice.org Calc? Gnumeric? [error message? data missing? what versions?]

If possible, e-mail me a zipped-up copy of the smallest file that has
this problem (if more than say 1 MB, make it available for downloading).
I promise not to disclose its contents. If you need a more formal
non-disclosure agreement, please supply the text. Please *don't* upload
the file to this group's file section. If you can't send me a file, I'll
need to send you an OLE dump script; consequently debugging (and testing
a fix, if one is possible) could become a slow tennis match.

Cheers,
John


Frank Hoffsümmer

unread,
Nov 3, 2008, 5:37:07 PM11/3/08
to python...@googlegroups.com
Wow, thanks John, I'll answer below


On Mon, Nov 3, 2008 at 9:59 PM, John Machin <sjma...@lexicon.net> wrote:

* What platform?

OS X 10.5.5
* Python 2.5.x ... what is x?

x = 1
 
* What software created the file?

unknown. its a report generated by an online statistics service we are using.
 
* Is the file one of many from the same source?

yes
 
* Do you have this problem with all files from that source, or only this
 one file?

all of them.
 
* Before raising AssertionError, did xlrd output any warning messages?

no
 
* What is the outcome when you try to open this file with Excel?
OpenOffice.org Calc? Gnumeric? [error message? data missing? what versions?]

opening works fine with Excel 2003 on Win XP and Excel 2008 and Numbers '08 on Mac OS X as well as Spreadsheet from OpenOffice 2.4 on Ubuntu 8.10
 
If possible, e-mail me a zipped-up copy of the smallest file that has
this problem (if more than say 1 MB, make it available for downloading).
I promise not to disclose its contents. If you need a more formal
non-disclosure agreement, please supply the text. Please *don't* upload
the file to this group's file section. If you can't send me a file, I'll
need to send you an OLE dump script; consequently debugging (and testing
a fix, if one is possible) could become a slow tennis match.


will do, the files are small (<100Kb), no NDA needed.

THANKS!
-frank
 

Chris Withers

unread,
Nov 3, 2008, 5:50:58 PM11/3/08
to python...@googlegroups.com
Frank Hoffsümmer wrote:
>
> will do, the files are small (<100Kb), no NDA needed.

Try opening the file in a text editor to check it's really binary and
not just html served with a .xls file extension.

This is an annoying trick often used by web apps where a decent library
such as xlwt is not available.

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk

John Machin

unread,
Nov 3, 2008, 6:12:02 PM11/3/08
to python...@googlegroups.com
On 4/11/2008 09:50, Chris Withers wrote:
> Frank Hoffsümmer wrote:
>> will do, the files are small (<100Kb), no NDA needed.
>
> Try opening the file in a text editor to check it's really binary and
> not just html served with a .xls file extension.

If that were the problem, it would have given quite a different message
("Unsupported format, or corrupt file: <further details>"). To get to
the point where it raised that assertion error, it has passed several
hurdles:

* first 8 bytes of file contain the OLE2 Compound Document magic cookie
* has correct little-endian flag
* sector sizes not ludicrous
* first part of master sector allocation table not noticeably stuffed
* etc

John Machin

unread,
Nov 4, 2008, 4:24:43 AM11/4/08
to python...@googlegroups.com
On 4/11/2008 09:37, Frank Hoffsümmer wrote:

> On Mon, Nov 3, 2008 at 9:59 PM, John Machin wrote:

>> If possible, e-mail me a zipped-up copy of the smallest file that has
>> this problem (if more than say 1 MB, make it available for downloading).

> will do, the files are small (<100Kb), no NDA needed.

Here's an update for the group:

Problem was that the unknown creating software was writing -1 (FREESID
i.e. a free sector) instead of -2 (EOCSID i.e. an end-of-chain marker)
for the first_SID when the SCCS was empty. Not having EOCSID caused an
assertion failure in _get_stream.

Solution: Avoid calling _get_stream in any case when the SCSS appears to
be empty (i.e. size == 0 and first_SID is negative).

svn updated. Frank happy. Case closed.

Cheers,
John

Reply all
Reply to author
Forward
0 new messages