xlrd 2 released

1769 views
Skip to first unread message

Chris Withers

unread,
Dec 11, 2020, 5:21:56 AM12/11/20
to python...@googlegroups.com

Hi All,

It's with some trepidation that I write that a new major release of xlrd is out.

The 2.x series has explicitly removed support for anything other than xls files, since xls files is the one thing left that xlrd does that no other python library I'm aware of has tackled.

I've done this for two reasons:

- pandas still has xlrd has its default engine for Excel files, but the xlsx reading in xlrd has become unreliable in Python 3.9.

- There are still xls files knocking around out there, and reading them with Python is still a thing people may need to do. I'm aware that one of the sources of these is lab equipment, and in the middle of a pandemic, I want to feel I've done what I can to help people do what they need to do.

If you have a problem opening a .xls file with xlrd 2.0.0 and above, you will need to make a pull request containing a sample file that reproduces the problem and state that you have authority and are happy for the sample file to become part of the open source and public xlrd repository.

cheers,

Chris

Christian Fobel

unread,
Jan 16, 2021, 8:12:02 AM (3 days ago) Jan 16
to python-excel
Hi Chris,

Thank you for your hard work on xlrd! I fully respect your decision to disable parts of the library that you have deemed unsafe or unreliable, and am sad to see that you have received some rather accusatory responses.  I will happily update my code to use openpyxl.

Out of professional interest, what is/are the security issue(s) or unreliability that lead to disabling xlsx support? Based on your comment, "xlsx reading in xlrd has become unreliable in Python 3.9", was an issue introduced specifically in Python 3.9? Perhaps related to XML parsing? I would really appreciate any insight you could provide since it may uncover vulnerabilities that may affect code in some of my other projects.

Thanks again for your work to support the community. I can appreciate that work on open source projects can be a relatively thankless undertaking.

Cheers,
Christian

Chris Withers

unread,
Jan 16, 2021, 10:05:45 AM (3 days ago) Jan 16
to python...@googlegroups.com

xlsx files are made up of a zip file wrapping an xml file.

Both xml and zip have well documented security issues, which xlrd was not doing a good job of handling. In particular, it appeared that defusedxml and xlrd did not work on Python 3.9, which lead people to uninstall defusedxml as a solution, which is absolutely insane, but then so is sticking with xlrd 1.2 when you could move to openpyxl, and yet here we are:

https://stackoverflow.com/a/65255334/216229

https://stackoverflow.com/a/65251042/216229

That, and the directly-emailed-to-me spam since the 2.0 release does not encourage me to invest time in people who want to use excel files from Python.

--
You received this message because you are subscribed to the Google Groups "python-excel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-excel...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/python-excel/fc93fb39-385c-4785-8d5b-0026b739141dn%40googlegroups.com.

Charlie Clark

unread,
Jan 16, 2021, 12:02:13 PM (3 days ago) Jan 16
to python...@googlegroups.com
On 16 Jan 2021, at 16:05, Chris Withers wrote:

> xlsx files are made up of a zip file wrapping an xml file.
>
> Both xml and zip have well documented security issues, which xlrd was
> not doing a good job of handling. In particular, it appeared that
> defusedxml and xlrd did not work on Python 3.9, which lead people to
> uninstall defusedxml as a solution, which is absolutely insane, but
> then so is sticking with xlrd 1.2 when you could move to openpyxl, and
> yet here we are:

FWIW I'm not sure about zip files but openpyxl does use defusedxml where
required. The support for this came as the result of a security review.

At the low level the XML parsing in xlrd and and openpyxl is very
similar, but the openpyxl code is far better tested and easier to
maintain. AFAIK there is no longer any reason to use xlrd to read XLSX
files: openpyxl has better support for the format, a richer API and is
even slightly faster.

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
Reply all
Reply to author
Forward
0 new messages