ANN: openpyxl 3.0.4 released

51 views
Skip to first unread message

Charlie Clark

unread,
Jun 24, 2020, 1:38:22 PM6/24/20
to openpyxl-users, python-excel
Hiya everyone,

just a brief announcement that openpyxl 3.0.4 has been released. This is
a bugfix release that fixes a couple of minor bugs. People with
workbooks with lots (and I mean many thousands of merged cells) should
see some notable performance improvements when reading such files,
otherwise we've finally introduced an API for working with tables in
existing workbooks. Special thanks to Shekhar Gyanwali for working on
this and sticking with it, despite my numerous suggestions! ;-)

Full details of the changes:

https://openpyxl.readthedocs.io/en/latest/changes.html

PS. Because of Atlassian's decision to follow fashion and drop support
for Mercurial – who knows what they'll decide to drop next – the
repository will be moving. This is a pity because there's a lot of
things to like on Bitbucket and we've had ten good years with them and
I'd like to thank them for that.

Heptapod, https://foss.heptapod.net/openpyxl/openpyxl, will be the new
home for the repository. The test migration worked very well with all
bugs and pull requests being copied successfully. The workflow for those
contributing is slightly different because private forks are not
available. There might be the odd hiccup during the migration and we'll
see how it goes but first of all my thanks to Heptapod for providing a
home for open source projects that, for whatever reason, want to use
Mercurial.

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Deac-33 Lancaster

unread,
Jun 24, 2020, 5:09:41 PM6/24/20
to python...@googlegroups.com
Charlie,

Thanks much for the update, I just upgraded my openpyxl.

I fear to impose on your good nature, so if you have no time or interest in my problem please just delete this message.
But for 2 1/2 months I’ve been struggling with a serious performance problem with openpyxl and have had no help from the forums.  It surprises me that no one else even responded, I can’t be the only one having this problem.  

If you care to even read it, here is my post to several fora:

Like many folks I need to read both .xls files (using xlrd) and .xlsx files (using openpyxl), in both cases files of about 30,000 rows. And in both cases I'm just copying all excel data read and writing it out to a .csv file, no other processing so just Input/Output.

But the xlsx file operations are over 200 times slower than for .xls, for example reading a 30,000 row .xlsx file now takes 2 minutes compared to 1/2 second for .xls with xlrd. We have thousands of files to process so the time per file matters.

Is openpyxl that much slower or do I need to do something, like release some resource at the end of each row?

BTW, I have made several great improvements by using read_only=True and reading a row at a time instead of cell by cell as shown in the following code segment. Thanks to blog.davep.org https://blog.davep.org/2018/06/02/a_little_speed_issue_with_openpyxl.html

 wb = openpyxl.load_workbook("excel_file.xlsx",  data_only=True,  read_only=True)
    sheet = wb.active
    for row in sheet.rows:
        for cell in row:
            cell_from_excel = cell.value
I had hoped that openpyxl 3.0.4 might improve the performance, but the 43,000 row file still took 2:09 minutes instead of .5 second by xlrd on a similar sized .xls file.

Thanks for any insight you may be able to bring to this.  

And thanks much if you get this far in my message, sorry to impose.  
🤓 

Best wishes,
-Deac Lancaster 
 😎
                                                                                              
                                                                                              




--
You received this message because you are subscribed to the Google Groups "python-excel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-excel...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/python-excel/B69E8F15-3BE6-4D7F-B717-8517B4D91853%40clark-consulting.eu.

derek

unread,
Jun 25, 2020, 2:04:32 AM6/25/20
to python-excel
The https://foss.heptapod.net/openpyxl/openpyxl page is responding with a 404 - is the site running yet?

Charlie Clark

unread,
Jun 25, 2020, 3:27:28 AM6/25/20
to python-excel
On 25 Jun 2020, at 8:04, derek wrote:

> The https://foss.heptapod.net/openpyxl/openpyxl page is responding
> with a
> 404 - is the site running yet?

It's currently not public. We're going to do a final import this week
and then I'll put a notice on the Bitbucket repo that it's moved.
Reply all
Reply to author
Forward
0 new messages