Strategy for Copying Worksheets into a new book

174 views
Skip to first unread message

PYNEWBIE

unread,
Mar 4, 2011, 10:30:28 PM3/4/11
to python-excel
I have a problem in that I am receiving some uuencoded files that are
supposed to be Excel (xls) files. I have a utility that decodes them
and saves them. However when I try to open them manually I get this
annoying message that the files appear to be a different format than
what is specified by the extension. If I click ok as to open them
anyway they open fine. Because these files are going to be used by a
number of people I want to clean them up so that they don't have to go
through that extra step to open the files.

I have been looking around for a number of days trying to figure out
the best strategy to handle these. I have determined that if I open a
file, handle the message and then copy the sheets (by selecting them,
right clicking and choosing copy to a new book) and then saving the
new workbook with a different name the problem disappears.

I am a hacker more than a programmer. Thus I am trying to figure out
if I should use the one of the libraries created by John Machin or use
the com interface. One issue that concerns me is that I have over
5,000 of these files to process right now and expect to handle over
50K a year.

My psuedo-code(?) would be

get list of workbooks (probably use glob)
for workbook in list
open the file
create a new_workbook
for worksheet in workbook
new_workbook.append(worksheet)
save new_workbook


I would appreciate any observations anyone would have to share about
this strategy.

Thanks

John Machin

unread,
Mar 5, 2011, 1:32:06 AM3/5/11
to python-excel


On Mar 5, 2:30 pm, PYNEWBIE <academiced...@gmail.com> wrote:
> I have a problem in that I am receiving some uuencoded files that are
> supposed to be Excel (xls) files.  I have a utility that decodes them
> and saves them.  However when I try to open them manually I get this
> annoying message that the files appear to be a different format than
> what is specified by the extension.  If I click ok as to open them
> anyway they open fine.  Because these files are going to be used by a
> number of people I want to clean them up so that they don't have to go
> through that extra step to open the files.

Evidently either the supposition that they are "Excel (xls)" files is
incorrect (totally different format) or they are XLS files but they
are somewhat mutated. Your strategy appears to be trying to solve the
problem without knowing what the problem really is.

Finding out:

If possible, send me some sample uuencoded files, and your uudecoding
tool.

Answer some questions:

1a: What happens when you try to read these so-called XLS files with
xlrd? E.g.

PYTHON_DIR/python PYTHON_DIR/scripts/runxlrd.py ov socalled.xls

1b: What do you get when you run this:

python -c "print repr(open('socalled.xls', 'rb').read(100))"

1c: What do you see when you open socalled.xls with a text editor?

2: Who produces these files with what software? Are they under your
control / amenable to suggestions / totally hostile? Why uuencode?

3: What version of Excel are you opening them with? What exactly is
the "annoying message"? If you then try to "save as", what type of
file does it default to?

4: As in (3), but with OpenOffice.org's Calc and Gnumeric.

5: Which version of what OS are you running?

> I have been looking around for a number of days trying to figure out
> the best strategy to handle these.  I have determined that if I open a
> file, handle the message and then copy the sheets (by selecting them,
> right clicking and choosing copy to a new book) and then saving the
> new workbook with a different name the problem disappears.

Instead of copying the sheets etc why don't you just save them as XLS
files with the same name in a different folder?

> I am a hacker more than a programmer.  Thus I am trying to figure out
> if I should use the one of the libraries created by John Machin or use
> the com interface.  One issue that concerns me is that I have over
> 5,000 of these files to process right now and expect to handle over
> 50K a year.
>
> My psuedo-code(?) would be

[snip]

> I would appreciate any observations anyone would have to share about
> this strategy.

It's possible that the solution is as simple as renaming the files to
have an extension compatible with their contents and/or repairing any
mild damage caused by uu(en|de)coding ... let's find out.

Cheers,
John

John Machin

unread,
Mar 5, 2011, 4:10:07 PM3/5/11
to python...@googlegroups.com
On Sat, March 5, 2011 5:32 pm, John Machin wrote:
>
>
> On Mar 5, 2:30�pm, PYNEWBIE <academiced...@gmail.com> wrote:
>> I have a problem in that I am receiving some uuencoded files that are
>> supposed to be Excel (xls) files. �I have a utility that decodes them
>> and saves them. �However when I try to open them manually I get this
>> annoying message that the files appear to be a different format than
>> what is specified by the extension. �If I click ok as to open them
>> anyway they open fine. �Because these files are going to be used by a
>> number of people I want to clean them up so that they don't have to go
>> through that extra step to open the files.
>
> Evidently either the supposition that they are "Excel (xls)" files is
> incorrect (totally different format) or they are XLS files but they
> are somewhat mutated. Your strategy appears to be trying to solve the
> problem without knowing what the problem really is.
>
> Finding out:
>
> If possible, send me some sample uuencoded files, and your uudecoding
> tool.

Thanks for sending the files off-list. I understand that you are busy so
I'll answer some of the questions for you ...

>
> Answer some questions:
>
> 1a: What happens when you try to read these so-called XLS files with
> xlrd? E.g.
>
> PYTHON_DIR/python PYTHON_DIR/scripts/runxlrd.py ov socalled.xls

*** Open failed: <class 'xlrd.biffh.XLRDError'>: Unsupported format, or
corrupt file: Expected BOF record; found '\xef\xbb\xbfMIME-'

So it's got a UTF-8 BOM followed by "MIME-" ... surely isn't an XLS file.

> 1b: What do you get when you run this:
>
> python -c "print repr(open('socalled.xls', 'rb').read(100))"
>
> 1c: What do you see when you open socalled.xls with a text editor?

MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related;
boundary="----=_NextPart_e4f52aae_7e2e_4240_b977_5481d71ccffa"

This document is a Single File Web Page, also known as a Web Archive file.
If you are seeing this message, your browser or editor doesn't support
Web Archive files. Please download a browser that supports Web Archive,
such as Microsoft Internet Explorer.

------=_NextPart_e4f52aae_7e2e_4240_b977_5481d71ccffa
Content-Location:
file:///C:/e4f52aae_7e2e_4240_b977_5481d71ccffa/Workbook.html
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"

<html xmlns:v=3D"urn:schemas-microsoft-com:vml"
xmlns:o=3D"urn:schemas-microsoft-com:office:office"
xmlns:x=3D"urn:schemas-microsoft-com:office:excel"
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">

<meta name=3DProgId content=3DExcel.Sheet>
<link rel=3DFile-List href=3D"Worksheets/filelist.xml">

<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>Document_and_Entity_Informatio</x:Name>
<x:WorksheetSource HRef=3D"Worksheets/Sheet01.html"/>
</x:ExcelWorksheet>
[snipped]


> 2: Who produces these files with what software? Are they under your
> control / amenable to suggestions / totally hostile? Why uuencode?
>
> 3: What version of Excel are you opening them with? What exactly is
> the "annoying message"? If you then try to "save as", what type of
> file does it default to?

Single File Web Page (*.mht, *.mhtml)

>
> 4: As in (3), but with OpenOffice.org's Calc and Gnumeric.
>
> 5: Which version of what OS are you running?

Windows

>
>> I have been looking around for a number of days trying to figure out
>> the best strategy to handle these. �I have determined that if I open a
>> file, handle the message and then copy the sheets (by selecting them,
>> right clicking and choosing copy to a new book) and then saving the
>> new workbook with a different name the problem disappears.
>
> Instead of copying the sheets etc why don't you just save them as XLS
> files with the same name in a different folder?
>
>> I am a hacker more than a programmer. �Thus I am trying to figure out
>> if I should use the one of the libraries created by John Machin or use
>> the com interface. �One issue that concerns me is that I have over
>> 5,000 of these files to process right now and expect to handle over
>> 50K a year.
>>
>> My psuedo-code(?) would be
>
> [snip]
>
>> I would appreciate any observations anyone would have to share about
>> this strategy.

I'd suggest naming the file with .mhtml extension then using Excel COM to
open the file and save it as an XLS file. You won't need to worry about
the initial "annoying" dialogue box but you'll get another when you "save
as xls"; this time muttering that some style info will be lost. I believe
there is a option on the "save as" method that allows suppression of such
a dialogue box.

HTH,
John

PYNEWBIE

unread,
Mar 5, 2011, 8:19:38 PM3/5/11
to python-excel
In the interest of giving back I am going to post my solution. John
took the time to research my issue and suggested that it was not
directly addressable with one of his libraries. That saved me
significant time. His suggestion is documented above. I did not want
to go directly down the path he suggested because basically it
intimidated me - I am not well trained. Thus I mucked around for a
couple of hours and figured out that the following would almost work.
Almost because I am getting a message box as I am saving the file and
while I am sure there is a way to handle the message box from within
Python I have not figured it out.

import win32com.client
xl=win32com.client.Dispatch('Excel.Application')
xl.visible=0
# I have noticed that if I don't set visible to 0 I can't SaveAs,
though I might be doing something else wrong
srce=xl.Workbooks.Open(r'c:
\testdd.xls') #
testdd is my file that is evidently really not an excel file
srce.SaveAs(r'c:
\newtttxt2.xls',FileFormat=1)
# this is when the message box pops up

as soon as I figure out how to handle the messsage box I will repost
this.

Thanks again for taking the time John, it certainly saved me from a
couple (or more) hours of beating my head against my monitor.

Cheers

Burch

PYNEWBIE

unread,
Mar 6, 2011, 9:54:31 AM3/6/11
to python-excel

Okay I got some help on SO and now it works.

So to tie it all together. I have some files that look and open like
Excel files but they are not. I need to save them as Excel files to
make further processing easier and to distribute them w/o the user
having to deal with the message boxes that occur because of their
structure. I can use win32com to do this

>>> import win32com.client
>>> xl=win32com.client.Dispatch('Excel.Application')
>>> xl.Visible=0
>>> import glob
>>> excelf=glob.glob(r'c:\mydir\*.xls')
>>> xl.DisplayAlerts=False #this is the piece that was missing, I can't seem to handle this using CheckCompatibility= False
>>> for wkbk in excelf[0:10]:
srce=xl.Workbooks.Open(wkbk)
srce.SaveAs('c:\\new_dir\\'+wkbk.split('\\')[-1],FileFormat=1)


Reply all
Reply to author
Forward
0 new messages