Status of XLRD reading .xlsx (Excel 2007)

1,477 views
Skip to first unread message

Darryl Wallace

unread,
Jun 24, 2009, 4:51:51 PM6/24/09
to python-excel
I know this has been asked in the past, but is support for
reading .xlsx (Excel 2007) format closer to being complete?

The reason I ask is because the included README.html mentions that
support is scheduled for v0.7.1 which is the current version. I tried
to read a simple excel 2007 (under ubuntu linux, python 2.5.4) file
and was greeted with the following error:
---
>>> book = xlrd.open_workbook("myexcel2007book.xlsx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xlrd/__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "xlrd/__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos
+8])
File "xlrd/__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected
BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
---
So my guess is that it's not ready and that's fine. I was just
interested in the status.

Regards,
Darryl

Daniel Burke

unread,
Jun 24, 2009, 9:16:12 PM6/24/09
to python...@googlegroups.com
xlsx files are zip archives with xml files in them, you can read them
with your favorite DOM parser if you're impatient.

regards,

dan
--
"It's your privilege as an artist to inflict the pain of creativity on
yourself." --Programming Perl 3rd Edition, end of first chapter.

John Machin

unread,
Jun 24, 2009, 10:29:39 PM6/24/09
to python...@googlegroups.com
On 25/06/2009 6:51 AM, Darryl Wallace wrote:

Hi Darryl,

> I know this has been asked in the past, but is support for
> reading .xlsx (Excel 2007) format closer to being complete?

The current intention is this:
Basic support will be in the next release, whenever that is, unless
something happens that causes it not to be. It is intended to support
on_demand=True but not formatting_info=True. Support for *any* version
of Excel is unlikely ever to be "complete".

> The reason I ask is because the included README.html mentions that
> support is scheduled for v0.7.1 which is the current version.

s/is/was/

I apologise for the slackness of the documentation team :-)

> I tried
> to read a simple excel 2007 (under ubuntu linux, python 2.5.4) file
> and was greeted with the following error:
> ---
>>>> book = xlrd.open_workbook("myexcel2007book.xlsx")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "xlrd/__init__.py", line 429, in open_workbook
> biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
> File "xlrd/__init__.py", line 1545, in getbof
> bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos
> +8])
> File "xlrd/__init__.py", line 1539, in bof_error
> raise XLRDError('Unsupported format, or corrupt file: ' + msg)
> xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected
> BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
> ---
> So my guess is that it's not ready and that's fine. I was just
> interested in the status.

If you have some non-simple XLSX files that you think may test the
capabilities of the development team, please send them. Of particular
interest would be files created by software other than Excel itself. As
with previous Excel versions, Microsoft documentation will say "you must
do X" but Excel will support reading non-X. This has already occurred
with the docs saying you must use the shared string table; C# code
supplied by an MS write-your-own-XLSX workshop doesn't comply but Excel
accepts the resultant file silently.

Cheers,
John

John Machin

unread,
Jun 24, 2009, 10:32:52 PM6/24/09
to python...@googlegroups.com
On 25/06/2009 11:16 AM, Daniel Burke wrote:
> xlsx files are zip archives with xml files in them, you can read them
> with your favorite DOM parser if you're impatient.

Yes, dead easy, might want to have a quick flick through the docs though:

http://www.ecma-international.org/publications/standards/Ecma-376.htm

BTW, you need 1st edition (2006) for Excel 2007, not 2nd edition (2008)


Darryl Wallace

unread,
Jun 25, 2009, 12:54:35 PM6/25/09
to python-excel
On Jun 24, 9:16 pm, Daniel Burke <dan.p.bu...@gmail.com> wrote:
> xlsx files are zip archives with xml files in them, you can read them
> with your favorite DOM parser if you're impatient.

Well.. thanks for the tip and your interesting contribution to this
thread? I understand they are zip archives with xml files in them. I
am, however, not impatient. I was simply asking for clarification.

Darryl Wallace

unread,
Jun 25, 2009, 12:58:54 PM6/25/09
to python-excel
> The current intention is this:
> Basic support will be in the next release, whenever that is, unless
> something happens that causes it not to be. It is intended to support
> on_demand=True but not formatting_info=True. Support for *any* version
> of Excel is unlikely ever to be "complete".

Thanks for the update!

> > The reason I ask is because the included README.html mentions that
> > support is scheduled for v0.7.1 which is the current version.
>
> s/is/was/
>
> I apologise for the slackness of the documentation team :-)

No problem.

> If you have some non-simple XLSX files that you think may test the
> capabilities of the development team, please send them. Of particular
> interest would be files created by software other than Excel itself. As
> with previous Excel versions, Microsoft documentation will say "you must
> do X" but Excel will support reading non-X. This has already occurred
> with the docs saying you must use the shared string table; C# code
> supplied by an MS write-your-own-XLSX workshop doesn't comply but Excel
> accepts the resultant file silently.

Most excel files I need to read in are simple, basic, data tables.
The main thing I'm interested in is simply the ability to have larger
data tables (>256 columns, etc.)

Thanks for your work on this library. It's proven to be extremely
useful.

Darryl
Reply all
Reply to author
Forward
0 new messages