Can't read an Excel spreadsheet I read a couple days ago...

510 views
Skip to first unread message

Skip Montanaro

unread,
Apr 29, 2011, 10:37:06 AM4/29/11
to python-excel
I have an Excel spreadsheet full of parameters which I edit with
OpenOffice 3 on Solaris 10. It's later read using xlrd to generate
some config files. Up until today this worked just fine. Now I get

Traceback (most recent call last):
File "../scripts/generate_config_xml.py", line 128, in ?
sys.exit(main(sys.argv[1:]))
File "../scripts/generate_config_xml.py", line 95, in main
paramsd = get_params_dict(params_file)
File "../scripts/generate_config_xml.py", line 38, in
get_params_dict
params_wb = xlrd.open_workbook(params_file)
File "/opt/app/g++lib6/python-2.4/lib/python2.4/site-packages/
xlrd-0.6.1-py2.4.egg/xlrd/__init__.py", line 370, in open_workbook
File "/opt/app/g++lib6/python-2.4/lib/python2.4/site-packages/
xlrd-0.6.1-py2.4.egg/xlrd/__init__.py", line 1323, in getbof
xlrd.biffh.XLRDError: Expected BOF record; found 0x3f3c

Poking around with Google I saw some suggestions that it was in XML
format. Pulled it up in Emacs. Looked like binary to me.

Here's what I've done so far:

* Removed the column I added this morning. Didn't help.

* Tried making a trivial edit then saving in the usual way (format
shows as Microsoft Excel 97/2000/XP in the Save As dialog).

* Tried saving it as Microsoft Excel 95. This did not help

* Sent it to a friend with real honest-to-goodness Excel. Had him
make a trivial edit, save, then send it back. He said it opened
in "compatibility mode". This also did not help.

* Checked versions of the code and programs. Nothing has changed.

* Tried my config generator with a different version of the file
(one checked in on a different branch in Subversion). This
worked fine.

I know we're not up-to-date on xlrd, but honestly, this worked
yesterday. I can't think of what might have changed to cause this
breakage. Any ideas?

Thanks,

Skip Montanaro
sk...@pobox.com

John Machin

unread,
Apr 29, 2011, 4:09:10 PM4/29/11
to python...@googlegroups.com

On Saturday, 30 April 2011 00:37:06 UTC+10, Skip Montanaro wrote:

Hi Skip,


I have an Excel spreadsheet full of parameters which I edit with
OpenOffice 3 on Solaris 10.  It's later read using xlrd to generate
some config files.  Up until today this worked just fine.  Now I get

    Traceback (most recent call last):
[snip]
      File "../scripts/generate_config_xml.py", line 38, in
get_params_dict
params_wb = xlrd.open_workbook(params_file)
      File "/opt/app/g++lib6/python-2.4/lib/python2.4/site-packages/
xlrd-0.6.1-py2.4.egg/xlrd/__init__.py", line 370, in open_workbook
      File "/opt/app/g++lib6/python-2.4/lib/python2.4/site-packages/
xlrd-0.6.1-py2.4.egg/xlrd/__init__.py", line 1323, in getbof
    xlrd.biffh.XLRDError: Expected BOF record; found 0x3f3c

What is happening is that your file is an XML file. xlrd doesn't care about the file extension, it examines the contents. At the stage where it borks, it is testing for its last hope, the pre-OLE raw format. This happens:

| >>> data = "<?xml etc etc ?>"
| >>> import struct
| >>> hex(struct.unpack("<H", data[:2])[0])
| '0x3f3c'


Poking around with Google I saw some suggestions that it was in XML
format.  Pulled it up in Emacs.  Looked like binary to me.

Here's what I've done so far:

    * Removed the column I added this morning.  Didn't help.

    * Tried making a trivial edit then saving in the usual way (format
      shows as Microsoft Excel 97/2000/XP in the Save As dialog).

    * Tried saving it as Microsoft Excel 95.  This did not help

    * Sent it to a friend with real honest-to-goodness Excel.  Had him
      make a trivial edit, save, then send it back.  He said it opened
      in "compatibility mode".  This also did not help.

    * Checked versions of the code and programs.  Nothing has changed.

    * Tried my config generator with a different version of the file
      (one checked in on a different branch in Subversion).  This
      worked fine.

I know we're not up-to-date on xlrd, but honestly, this worked
yesterday.  I can't think of what might have changed to cause this
breakage.  Any ideas?

Check to see whether your manual operations and your script are referring to different paths. Consider putting some debug code in your script just in front of the call to xlrd.open_workbook:

print "params_file:", repr(params_file)
print "realpath:", repr(os.path.realpath(params_file))
f = open(params_file, 'rb')
print "first 20 bytes:", repr(f.read(20))
f.close()

Cheers,
John
 
 

sk...@pobox.com

unread,
Apr 29, 2011, 5:59:02 PM4/29/11
to python...@googlegroups.com

John> Check to see whether your manual operations and your script are
John> referring to different paths. Consider putting some debug code in
John> your script just in front of the call to xlrd.open_workbook:

John> print "params_file:", repr(params_file)
John> print "realpath:", repr(os.path.realpath(params_file))
John> f = open(params_file, 'rb')
John> print "first 20 bytes:", repr(f.read(20))
John> f.close()

Seems pretty unlikely, since I name the spreadsheet file on the command
line. Still, I will add such code if I continue to have problems. I spent
the day working on other stuff and just came back to this. I figured I'd
run it once more to verify the problem, then add your code. Faith and
Begora! It ran fine. I am completely flummoxed at this point. I verified
that I was working with the same file indirectly. First, I ran it against a
new file which expected the new column. It complained that the column was
missing. I went back to OO, added the column, ran again, and it worked.

Skip

Ian Kelly

unread,
Apr 29, 2011, 6:06:19 PM4/29/11
to python...@googlegroups.com
On Fri, Apr 29, 2011 at 8:37 AM, Skip Montanaro <sk...@pobox.com> wrote:
> Poking around with Google I saw some suggestions that it was in XML
> format.  Pulled it up in Emacs.  Looked like binary to me.

XLSX is actually a collection of XML files in a ZIP archive, so it
would indeed look like binary if you opened it up in Emacs. I have no
idea whether this is at all related to your problem, though.

sk...@pobox.com

unread,
Apr 29, 2011, 8:32:19 PM4/29/11
to python...@googlegroups.com
>> Poking around with Google I saw some suggestions that it was in XML
>> format.  Pulled it up in Emacs.  Looked like binary to me.

Ian> XLSX is actually a collection of XML files in a ZIP archive, so it
Ian> would indeed look like binary if you opened it up in Emacs. I have
Ian> no idea whether this is at all related to your problem, though.

Right, but the version of OO I run can't generate XLSX files. (As I recall
it can read them.) The most recent format it can write is Excel 97/2000/XP
workbooks I believe. I will try an unzip -t if this happens again (and I
remember), but at the moment at least the file in question is no such beast:

% unzip -t params.xls
Archive: params.xls
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
note: params.xls may be a plain executable, not an archive
unzip: cannot find zipfile directory in one of params.xls or
params.xls.zip, and cannot find params.xls.ZIP, period.

OTOH, a .xlsx file I have laying around:

% unzip -t Putin_parameters\ sept\ 2010\ to\ publish.xlsx
Archive: Putin_parameters sept 2010 to publish.xlsx
testing: [Content_Types].xml OK
testing: _rels/.rels OK
testing: xl/_rels/workbook.xml.rels OK
testing: xl/workbook.xml OK
testing: xl/theme/theme1.xml OK
testing: xl/worksheets/_rels/sheet1.xml.rels OK
testing: xl/worksheets/sheet2.xml OK
testing: xl/worksheets/sheet3.xml OK
testing: xl/worksheets/sheet1.xml OK
testing: xl/sharedStrings.xml OK
testing: xl/styles.xml OK
testing: xl/printerSettings/printerSettings1.bin OK
testing: docProps/core.xml OK
testing: docProps/app.xml OK
No errors detected in compressed data of Putin_parameters sept 2010 to publish.xlsx.

I have NeoOffice on my Mac which can write XLSX files. It's possible that I
inadvertently edited it there and saved in that format, however I see no
evidence in the Recent Documents list that I've visited this particular
spreadsheet using that tool.

Thanks for all the feedback. I will have more diagnostic weapons in my
arsenal if this happens again.

Skip

John Machin

unread,
Apr 29, 2011, 9:41:49 PM4/29/11
to python...@googlegroups.com


On Saturday, 30 April 2011 10:32:19 UTC+10, Skip Montanaro wrote:
   >> Poking around with Google I saw some suggestions that it was in XML
    >> format.  Pulled it up in Emacs.  Looked like binary to me.

    Ian> XLSX is actually a collection of XML files in a ZIP archive, so it
    Ian> would indeed look like binary if you opened it up in Emacs.  I have
    Ian> no idea whether this is at all related to your problem, though.


Believe me -- the file that caused the exception started with "<?".

XLSX files are ZIP files.

ZIP files start with "PK" (as in "Phil Katz").

XLS files start with "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"

John Machin

unread,
Apr 29, 2011, 9:51:13 PM4/29/11
to python...@googlegroups.com


On Saturday, 30 April 2011 07:59:02 UTC+10, Skip Montanaro wrote:

    John> Check to see whether your manual operations and your script are
    John> referring to different paths. Consider putting some debug code in
    John> your script just in front of the call to xlrd.open_workbook:

    John> print "params_file:", repr(params_file)
    John> print "realpath:", repr(os.path.realpath(params_file))
    John> f = open(params_file, 'rb')
    John> print "first 20 bytes:", repr(f.read(20))
    John> f.close()

Seems pretty unlikely, since I name the spreadsheet file on the command
line.

How does "name the spreadsheet file on the command line" guarantee that it is not named incorrectly?
 

 Still, I will add such code if I continue to have problems.  I spent
the day working on other stuff and just came back to this.  I figured I'd
run it once more to verify the problem, then add your code.  Faith and
Begora!  It ran fine.  I am completely flummoxed at this point.  I verified
that I was working with the same file indirectly.  First, I ran it against a
new file which expected the new column.  It complained that the column was
missing.  I went back to OO, added the column, ran again, and it worked.

I don't see how that verifies anything. After all the manual manoeuvres that you mentioned, you are likely to have all sorts of files everywhere.

Consider the possibility that you have foo.xls and foo.xml and that you inadvertently typed foo.xml on the command line.


Reply all
Reply to author
Forward
0 new messages