Thanks for sending the files off-list. I understand that you are busy so
I'll answer some of the questions for you ...
>
> Answer some questions:
>
> 1a: What happens when you try to read these so-called XLS files with
> xlrd? E.g.
>
> PYTHON_DIR/python PYTHON_DIR/scripts/runxlrd.py ov socalled.xls
*** Open failed: <class 'xlrd.biffh.XLRDError'>: Unsupported format, or
corrupt file: Expected BOF record; found '\xef\xbb\xbfMIME-'
So it's got a UTF-8 BOM followed by "MIME-" ... surely isn't an XLS file.
> 1b: What do you get when you run this:
>
> python -c "print repr(open('socalled.xls', 'rb').read(100))"
>
> 1c: What do you see when you open socalled.xls with a text editor?
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related;
boundary="----=_NextPart_e4f52aae_7e2e_4240_b977_5481d71ccffa"
This document is a Single File Web Page, also known as a Web Archive file.
If you are seeing this message, your browser or editor doesn't support
Web Archive files. Please download a browser that supports Web Archive,
such as Microsoft Internet Explorer.
------=_NextPart_e4f52aae_7e2e_4240_b977_5481d71ccffa
Content-Location:
file:///C:/e4f52aae_7e2e_4240_b977_5481d71ccffa/Workbook.html
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"
<html xmlns:v=3D"urn:schemas-microsoft-com:vml"
xmlns:o=3D"urn:schemas-microsoft-com:office:office"
xmlns:x=3D"urn:schemas-microsoft-com:office:excel"
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">
<meta name=3DProgId content=3DExcel.Sheet>
<link rel=3DFile-List href=3D"Worksheets/filelist.xml">
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>Document_and_Entity_Informatio</x:Name>
<x:WorksheetSource HRef=3D"Worksheets/Sheet01.html"/>
</x:ExcelWorksheet>
[snipped]
> 2: Who produces these files with what software? Are they under your
> control / amenable to suggestions / totally hostile? Why uuencode?
>
> 3: What version of Excel are you opening them with? What exactly is
> the "annoying message"? If you then try to "save as", what type of
> file does it default to?
Single File Web Page (*.mht, *.mhtml)
>
> 4: As in (3), but with OpenOffice.org's Calc and Gnumeric.
>
> 5: Which version of what OS are you running?
Windows
>
>> I have been looking around for a number of days trying to figure out
>> the best strategy to handle these. �I have determined that if I open a
>> file, handle the message and then copy the sheets (by selecting them,
>> right clicking and choosing copy to a new book) and then saving the
>> new workbook with a different name the problem disappears.
>
> Instead of copying the sheets etc why don't you just save them as XLS
> files with the same name in a different folder?
>
>> I am a hacker more than a programmer. �Thus I am trying to figure out
>> if I should use the one of the libraries created by John Machin or use
>> the com interface. �One issue that concerns me is that I have over
>> 5,000 of these files to process right now and expect to handle over
>> 50K a year.
>>
>> My psuedo-code(?) would be
>
> [snip]
>
>> I would appreciate any observations anyone would have to share about
>> this strategy.
I'd suggest naming the file with .mhtml extension then using Excel COM to
open the file and save it as an XLS file. You won't need to worry about
the initial "annoying" dialogue box but you'll get another when you "save
as xls"; this time muttering that some style info will be lost. I believe
there is a option on the "save as" method that allows suppression of such
a dialogue box.
HTH,
John