Script to convert from XLSX to XLS

4,625 views
Skip to first unread message

Michael

unread,
Jun 15, 2009, 12:29:41 PM6/15/09
to python-excel
Recently I needed to quickly convert XLSX workbooks to XLS workbooks
so I could then interact with them via xlrd. Here it is, hopefully it
will be useful to someone. :) Note that pywin32 is required to
interact with Excel 2007, so unfortunately this script will work only
on Windows with Excel 2007 installed.

This script is to be executed in the directory of XLSX workbooks
pending conversion.


import glob
import os
import time
import win32com.client

xlsx_files = glob.glob('*.xlsx')

if len(xlsx_files) == 0:
raise RuntimeError('No XLSX files to convert.')

xlApp = win32com.client.Dispatch('Excel.Application')

for file in xlsx_files:
xlWb = xlApp.Workbooks.Open(os.path.join(os.getcwd(), file))
xlWb.SaveAs(os.path.join(os.getcwd(), file.split('.xlsx')[0] +
'.xls'), FileFormat=1)

xlApp.Quit()

# Delete or comment out the following lines if you want to preserve
the
# original XLSX files.

time.sleep(2) # give Excel time to quit, otherwise files may be locked
for file in xlsx_files:
os.unlink(file)

vasudevram

unread,
Jun 18, 2009, 8:45:48 PM6/18/09
to python-excel
Interesting approach.

For a possibly limited but cross-platform way (i.e. don't need to be
on Windows or use pywin32) to do the same conversion from .XLSX
to .XLS files, it is also possible to use an XML parser, such as a SAX-
capable parser (to read the content (*) of the .XLSX files, and then
write the same content to .XLS files using, I guess, the Python xlwt
library, which is mentioned in other messages in this group.). (I have
not used xlwt (yet), which is why I said "I guess", though I have used
its counterpart for reading, xlrd, in my xtopdf toolkit.)

(*) Conditions apply - see below.

This alternative method is possible because .XLSX format files are a
kind of XML. There is a recipe for how to extract the text-only
content (i.e. numbers and strings, no formatting or images or charts -
this is the condition mentioned above) of .XLSX files, using SAX, in
the Python Cookbook 2nd Edition. I had tried out that recipe some time
ago (it worked fine, though I had to tweak it a bit), and used it to
convert the (text-only) content of .XLSX files to PDF, as part of my
xtopdf toolkit. That code is not in the xtopdf release yet, but will
be after some time. If I can dig up the (standalone) code I wrote for
that conversion, I'll post a link to it here in a few days. But
basically, it's really easy to read .XLSX content with Python using
SAX, since there are clearly defined XML elements for tables, rows and
cells. In fact, that means you can also read the .XLSX content using
any language that has a SAX XML parser, not just Python.

- Vasudev Ram
Biz site: www.dancingbison.com
xtopdf: fast and easy PDF creation from other file formats:
www.dancingbison.com/products.html
Blog (on software innovation): jugad2.blogspot.com



John Machin

unread,
Jun 19, 2009, 4:25:00 AM6/19/09
to python...@googlegroups.com
On 19/06/2009 10:45 AM, vasudevram wrote:
>
>
> On Jun 15, 9:29 pm, Michael <selmo2...@gmail.com> wrote:
>> Recently I needed to quickly convert XLSX workbooks to XLS workbooks
>> so I could then interact with them via xlrd. Here it is, hopefully it
>> will be useful to someone. :) Note that pywin32 is required to
>> interact with Excel 2007, so unfortunately this script will work only
>> on Windows with Excel 2007 installed.

[snip]

>
> Interesting approach.
>
> For a possibly limited but cross-platform way (i.e. don't need to be
> on Windows or use pywin32) to do the same conversion from .XLSX
> to .XLS files, it is also possible to use an XML parser, such as a SAX-
> capable parser (to read the content (*) of the .XLSX files, and then
> write the same content to .XLS files using, I guess, the Python xlwt
> library, which is mentioned in other messages in this group.). (I have
> not used xlwt (yet), which is why I said "I guess", though I have used
> its counterpart for reading, xlrd, in my xtopdf toolkit.)
>
> (*) Conditions apply - see below.
>
> This alternative method is possible because .XLSX format files are a
> kind of XML. There is a recipe for how to extract the text-only
> content (i.e. numbers and strings, no formatting or images or charts -
> this is the condition mentioned above) of .XLSX files, using SAX, in
> the Python Cookbook 2nd Edition.

XLSX files were introduced by Excel 2007 (i.e. v12). An XLSX file is a
ZIP file containing a bundle of XML documents. The Python Cookbook 2nd
Edition was published in 2005. The recipe (12.7) to which you refer
relates to the XML files that can be produced by Excel 2003 (v11) and
Excel XP (v10), using the "XML Spreadsheet" option of "Save as". The two
formats are XMLly and Microsofty but otherwise rather dissimilar.

> I had tried out that recipe some time
> ago (it worked fine, though I had to tweak it a bit), and used it to
> convert the (text-only) content of .XLSX files to PDF, as part of my
> xtopdf toolkit. That code is not in the xtopdf release yet, but will
> be after some time. If I can dig up the (standalone) code I wrote for
> that conversion, I'll post a link to it here in a few days. But
> basically, it's really easy to read .XLSX content with Python using
> SAX, since there are clearly defined XML elements for tables, rows and
> cells. In fact, that means you can also read the .XLSX content using
> any language that has a SAX XML parser, not just Python.

Any parser within reason can be used, not just SAX. We have an XLSX
parser (using ElementTree) in the queue to be plugged into xlrd. It
handles the basics i.e. open_workbook(..., formatting_info=0).

Eduardo Silva

unread,
Jun 19, 2009, 11:53:55 AM6/19/09
to python...@googlegroups.com
[My comments] I recall once doing a batch conversion of xls to xlsx with a tool provided for free download on the Microsoft's site. I can't recall if they had one for the same: xlsx to xls.

vasudevram

unread,
Jun 20, 2009, 3:19:17 PM6/20/09
to python-excel


On Jun 19, 1:25 pm, John Machin <sjmac...@lexicon.net> wrote:
> On 19/06/2009 10:45 AM, vasudevram wrote:
> > For a possibly limited but cross-platform way (i.e. don't need to be
> > on Windows or use pywin32) to do the same conversion from .XLSX
> > to .XLS files, it is also possible to use an XML parser, such as a SAX-
> > capable parser (to read the content (*) of the .XLSX files, and then
> > write the same content to .XLS files using, I guess, the Python xlwt
> > library, which is mentioned in other messages in this group.). (I have
> > not used xlwt (yet), which is why I said "I guess", though I have used
> > its counterpart for reading, xlrd, in my xtopdf toolkit.)
>
> >  (*) Conditions apply - see below.
>
> > This alternative method is possible because .XLSX format files are a
> > kind of XML. There is a recipe for how to extract the text-only
> > content (i.e. numbers and strings, no formatting or images or charts -
> > this is the condition mentioned above) of .XLSX files, using SAX, in
> > the Python Cookbook 2nd Edition.
>
> XLSX files were introduced by Excel 2007 (i.e. v12). An XLSX file is a
> ZIP file containing a bundle of XML documents. The Python Cookbook 2nd
> Edition was published in 2005. The recipe (12.7) to which you refer
> relates to the XML files that can be produced by Excel 2003 (v11) and
> Excel XP (v10), using the "XML Spreadsheet" option of "Save as". The two
> formats are XMLly and Microsofty but otherwise rather dissimilar.

OK, you must be right. Thanks for the info. And sorry,
comp.lang.python group, for unintentionally writing something
incorrect. It's a while now since I used that recipe, so I don't
remember the full details of what I did, and don't have my code handy
right now. But I do have the Python Cookbook 2nd edition with me now,
and just looked it up - you're right about the recipe - it's 12.7.

Also, there is no reference to the .XLSX extension in the recipe, and
it does say that the Excel XML format that the recipe can read, is
called XMLSS by Microsoft (for XML Spread Sheet, I guess, as you say
above - not XLSX).

>
> > I had tried out that recipe some time
> > ago (it worked fine, though I had to tweak it a bit), and used it to
> > convert the (text-only) content of .XLSX files to PDF, as part of my
> > xtopdf toolkit. That code is not in the xtopdf release yet, but will
> > be after some time. If I can dig up the (standalone) code I wrote for
> > that conversion, I'll post a link to it here in a few days. But
> > basically, it's really easy to read .XLSX content with Python using
> > SAX, since there are clearly defined XML elements for tables, rows and
> > cells. In fact, that means you can also read the .XLSX content using
> > any language that has a SAX XML parser, not just Python.
>
> Any parser within reason can be used, not just SAX.

You're right again, of course. Since the recipe mentions and uses SAX,
I also said SAX, but there's no reason any other kind of XML parser
such as a DOM parser, cannot be used, as long as it meets the needs.

> We have an XLSX parser (using ElementTree) in the queue to be plugged into xlrd. It
> handles the basics i.e. open_workbook(..., formatting_info=0).

Great news. Maybe I'll use that then, for converting XLSX to PDF, when
I release the next version of xtopdf, which will be sometime in the
near future - most of the code for it is written, what's left is
mainly more testing, refactoring and documentation.

- Vasudev

vasudevram

unread,
Jun 20, 2009, 3:36:40 PM6/20/09
to python-excel


On Jun 21, 12:19 am, vasudevram <vasudev...@gmail.com> wrote:
>
> And sorry, comp.lang.python group,

Oops, I meant python-excel group :(
Reply all
Reply to author
Forward
0 new messages