Is there a way to create merged cells with optimized writer?

1,059 views
Skip to first unread message

jean-jacque...@nextis-consulting.com

unread,
Mar 25, 2015, 5:33:21 AM3/25/15
to openpyx...@googlegroups.com
Hi all,

I am trying to "translate" a Python script using xlwt to using openpyxl.

This script read a large csv file (>400 Mb), sort data then write to a .xls file with a lot of formatting and merged cells, specially cells merged across rows on single columns.
Sometimes I get memory issues, and input file is growing. Can openpyxl be less memory consuming than xlwt?

I have seen there is a DumpWorksheet class intended for reducing memory, but merge_cells method is not implemented?

Or did I miss something?

Thanks for your help!


CONFIDENTIALITE : Ce message et les éventuelles pièces attachées sont confidentiels. Si vous n'êtes pas dans la liste des destinataires, veuillez informer l'expéditeur  immédiatement et ne pas divulguer le contenu à une tierce personne, ne pas l'utiliser pour quelque raison que ce soit, ne pas stocker ou copier l'information qu'il contient sur un quelconque support. Tous droits réservés.

CONFIDENTIALITY : This e-mail and any attachments are confidential and may be privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose or store or copy the information in any medium. All rights reserved.

Charlie Clark

unread,
Mar 25, 2015, 5:58:27 AM3/25/15
to openpyx...@googlegroups.com
Am .03.2015, 10:33 Uhr, schrieb
<jean-jacque...@nextis-consulting.com>:

> Hi all,

> I am trying to "translate" a Python script using xlwt to using openpyxl.
> This script read a large csv file (>400 Mb), sort data then write to a
> .xls
> file with a lot of formatting and merged cells, specially cells merged
> across rows on single columns.

> Sometimes I get memory issues, and input file is growing. Can openpyxl be
> less memory consuming than xlwt?

The implementations are completely different but it is more than possible
that openpyxl uses more memory. Worksheets with a lot of unique strings
tend to use a lot of memory.

> I have seen there is a DumpWorksheet class intended for reducing memory,
> but merge_cells method is not implemented?

Correct. I don't it's been required so far.

> Or did I miss something?

It shouldn't be difficult to implement merged cells for the write-only
mode but the API needs thinking through. As it stands cells that have been
merged are removed from a worksheet apart from the top-left one,
write-only worksheets do not allow any changes once cells have been added.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

jean-jacque...@nextis-consulting.com

unread,
Mar 25, 2015, 7:25:38 AM3/25/15
to openpyx...@googlegroups.com, jean-jacque...@nextis-consulting.com
Thank you for your answer.

I made it work with openpyxl with almost same results, and what I have seen on a reduced test file (only 100 Mb) is:
  • peak memory usage: 783 Mb with openpyxl vs. 502 Mb with xlwt
  • resulting file size: 10 Mb with openpyxl (.xlsx) vs. 33 Mb with xlwt (.xls)
  • openpyxl is also slower to save the Excel file.
So it seems that openpyxl is not reducing my memory risk; too bad because I was also interested in decreasing Excel file size (for later transfers across the net) and in some abilities that are not in xlwt...

I have to look for another solution...
Have a good day!




Charlie Clark

unread,
Mar 25, 2015, 7:30:07 AM3/25/15
to openpyx...@googlegroups.com
Am .03.2015, 12:25 Uhr, schrieb
<jean-jacque...@nextis-consulting.com>:

> Thank you for your answer.
> I made it work with openpyxl with almost same results, and what I have
> seen
> on a reduced test file (only 100 Mb) is:
> - peak memory usage: 783 Mb with openpyxl vs. 502 Mb with xlwt

This will be a lot less in write-only mode but you won't be able to merge
cells.

> - resulting file size: 10 Mb with openpyxl (.xlsx) vs. 33 Mb with xlwt
> (.xls)

xlsx files are zip archives and XML compresses well.

> - openpyxl is also slower to save the Excel file.

This is almost unavoidable due to the overhead imposed by XML. If you have
lxml installed it should be pretty fast.

> So it seems that openpyxl is not reducing my memory risk; too bad
> because I
> was also interested in decreasing Excel file size (for later transfers
> across the net) and in some abilities that are not in xlwt...

You can't really make the file any smaller but as long as you're only
writing files you might want to look at xlsxwriter which is fast and
memory efficient.

jean-jacque...@nextis-consulting.com

unread,
Mar 25, 2015, 10:41:52 AM3/25/15
to openpyx...@googlegroups.com
> So it seems that openpyxl is not reducing my memory risk; too bad  
> because I
> was also interested in decreasing Excel file size (for later transfers
> across the net) and in some abilities that are not in xlwt...

You can't really make the file any smaller but as long as you're only  
writing files you might want to look at xlsxwriter which is fast and  
memory efficient.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research

OK, I had a look at XlsxWriter, it is said to be faster (but time is not my main concern) but to know if it uses less memory I shall have to try it... Maybe later.
XlsxWriter "constant_memory" mode is the same as in OpenPyXl; write rows sequentially and flush memory after each row; so I guess I would have the same issue with merged cells on several rows.

What would be nice in my case would be to flush memory after each sheet is written... What if I save Excel file after each sheet, close and reopen it to add another sheet, would it load all previous sheets in memory if I don't modify them?

Thanks again,

JJB

Charlie Clark

unread,
Mar 25, 2015, 11:21:17 AM3/25/15
to openpyx...@googlegroups.com
Am .03.2015, 15:41 Uhr, schrieb
<jean-jacque...@nextis-consulting.com>:

> OK, I had a look at XlsxWriter, it is said to be faster (but time is not
> my main concern) but to know if it uses less memory I shall have to try
> it... Maybe later.

Depending upon your data there's not much in it now if lxml is installed.

> XlsxWriter "constant_memory" mode is the same as in OpenPyXl; write rows
> sequentially and flush memory after each row; so I guess I would have the
> same issue with merged cells on several rows.

Constant memory in xlsxwriter also writes strings straight to the
worksheet instead of keeping them in a dictionary which is then stored
separately. This can be a memory issue if you have lots of text.

> What would be nice in my case would be to flush memory after each *sheet*
> is written... What if I save Excel file after each sheet, close and
> reopen
> it to add another sheet, would it load all previous sheets in memory if I
> don't modify them?

In write-only mode each row is written to file once it's completed. This
uses temporary files which are closed when the workbook is finished. In
standard mode everything is resident in memory. So far having three modes:
standard, read-only, write-only manages to cover most situations. I think
PHPExcel has various caching strategies for editing files where memory is
tight but to be honest I think this adds a lot of complexity for little
gain. Cell objects already use slots and are about as small as you can
make them in Python without losing a lot of the flexibility. I've thought
about possibly using lazy loading of worksheets in standard mode but this
again would add complexity. Using any kind of file to buffer the data
might make things quite a bit slower for little possible gain. And you
also have to make sure that nothing trashes those files.

Without seeing any code it's hard to recommend what's best. If you can
come up with code for merging cells so that it can be run *before* those
cells are written then we might consider this in for write-only mode. It's
probably possible to write the merged cells stuff anyway but I'm loathe to
have a worksheet with cells that are merged but may also have residual
values in them. We've got a question about this with the ECMA committee
responsible for the file format because merged cells are not well
specified.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research

jean-jacque...@nextis-consulting.com

unread,
Mar 25, 2015, 12:45:17 PM3/25/15
to openpyx...@googlegroups.com
Here is what I try to achieve:

Images intégrées 1

I have a tree data structure containing values and styles; currently I write leaves column W, then branches columns U and V, then trunk columns A to S.

I could count how many lines must be merged in columns A to S, and in columns U and V, depending on column W content (not merged), before writing row 5. 
So what then? How should I create row 5 to append it to a DumpWorksheet? A list of WriteOnlyCells with or without values, with or without an attribute "top left cell of merged range A5:A8"?
And for row 6 to 8? Empty cells for cols A to S? 
And how to apply style (borders, color) to all merged cells?

This works with standard mode:

# ------------------------------------
def wsWriteMerge(leWS, iStartRow0, iEndRow0, iStartCol0, iEndCol0, laValeur, leStyle):
   
"Replace xlwt ws.write_merge using openpyxl syntax"
       

    leWS
.merge_cells(start_row=iStartRow0+1, end_row=iEndRow0+1, start_column=iStartCol0+1, end_column=iEndCol0+1)


    leWS
.cell(row=iStartRow0+1, column=iStartCol0+1).value = laValeur


   
for ii in range(iStartRow0+1, iEndRow0+1+1):
       
for jj in range(iStartCol0+1, iEndCol0+1+1):
                leWS
.cell(row=ii, column=jj).style = leStyle
   

# ------------------------------------



Best regards,

JJB

Charlie Clark

unread,
Mar 25, 2015, 1:46:33 PM3/25/15
to openpyx...@googlegroups.com
Am .03.2015, 17:45 Uhr, schrieb
<jean-jacque...@nextis-consulting.com>:

> I have a tree data structure containing values and styles; currently I
> write leaves column W, then branches columns U and V, then trunk columns
> A
> to S.

To use write-only mode you'll have to flatten the tree so that you can
pass rows to be serialised. I suspect that using a temporary sqlite
database might be the best way to do this to keep memory use down. Or
maybe Pandas?

Jean-Jacques Benedetti

unread,
Mar 25, 2015, 3:18:05 PM3/25/15
to openpyx...@googlegroups.com


2015-03-25 18:46 GMT+01:00 Charlie Clark <charli...@clark-consulting.eu>:

Am .03.2015, 17:45 Uhr, schrieb <jean-jacques.benedetti@nextis-consulting.com>:

I have a tree data structure containing values and styles; currently I
write leaves column W, then branches columns U and V, then trunk columns A
to S.

To use write-only mode you'll have to flatten the tree so that you can pass rows to be serialised. I suspect that using a temporary sqlite database might be the best way to do this to keep memory use down. Or maybe Pandas?

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research

If I use write-only mode, I should not have to keep memory use that low, should I? A database or a bear (no idea what Pandas can be...) seems a bit overkill to me; I don't want to get to such sophistication if I can avoid...
I can easily create each entire row, but I still don't know how to merge cells in this mode?
--
Cordialement,

Jean-Jacques Benedetti
Consultant

Description : Description : D:\Documents de jbenedetti\JJ\image001.png

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme8.png

jean-jacque...@nextis-consulting.com

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme7.png

+33 (0) 6 17 74 24 30

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme6.png

+33 (0) 9 82 39 39 38

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme10.png

18 rue de Mogador 75009 Paris FRANCE

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme9.png

www.nextis-consulting.com

343.gif

Charlie Clark

unread,
Mar 26, 2015, 4:20:13 AM3/26/15
to openpyx...@googlegroups.com
Am .03.2015, 20:18 Uhr, schrieb Jean-Jacques Benedetti
<jean-jacque...@nextis-consulting.com>:

> If I use write-only mode, I should not have to keep memory use that low,
> should I? A database or a bear (no idea what Pandas can be...[?])

Pandas is an excellent Python library working with large datasets using
the NumPy extensions. You can do all kinds of transformations using Python
and then dump to various file formats including XLSX.

> seems a
> bit overkill to me; I don't want to get to such sophistication if I can
> avoid...

> I can easily create each entire row, but I still don't know how to merge
> cells in this mode?

You'd need to implement it. The method itself really just creates ranges
that themselves just get serialised into the output via
write_merged_cells. So you'd need to call that manually as well.

You might also think about a two-stage approach: create the worksheets in
write-only mode and see if you can then edit them in standard mode.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research

Charlie Clark

unread,
Mar 26, 2015, 4:20:25 AM3/26/15
to openpyx...@googlegroups.com
Am .03.2015, 20:18 Uhr, schrieb Jean-Jacques Benedetti
<jean-jacque...@nextis-consulting.com>:

> If I use write-only mode, I should not have to keep memory use that low,
> should I? A database or a bear (no idea what Pandas can be...[?])

Pandas is an excellent Python library working with large datasets using
the NumPy extensions. You can do all kinds of transformations using Python
and then dump to various file formats including XLSX.

> seems a
> bit overkill to me; I don't want to get to such sophistication if I can
> avoid...

> I can easily create each entire row, but I still don't know how to merge
> cells in this mode?

You'd need to reimplement it. The method itself really just creates ranges
that themselves just get serialised into the output via
write_merged_cells. So you'd need to call that manually as well.

You might also think about a two-stage approach: create the worksheets in
write-only mode and see if you can then edit them in standard mode.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research

Jean-Jacques Benedetti

unread,
Mar 26, 2015, 5:06:45 AM3/26/15
to openpyx...@googlegroups.com
OK, thank you very much again for your explanations. I shall think over the different options; for now I have to work on another subject. So far current version of the script with xlwt is working, when it will try to go over the memory limit and crash I shall spend more time on it...

Best regards,

2015-03-26 9:20 GMT+01:00 Charlie Clark <charli...@clark-consulting.eu>:
Am .03.2015, 20:18 Uhr, schrieb Jean-Jacques Benedetti
--
You received this message because you are subscribed to a topic in the Google Groups "openpyxl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpyxl-users/ksN63fhrjWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpyxl-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Cordialement,

Jean-Jacques Benedetti
Consultant

Description : Description : D:\Documents de jbenedetti\JJ\image001.png

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme8.png

jean-jacque...@nextis-consulting.com

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme7.png

+33 (0) 6 17 74 24 30

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme6.png

+33 (0) 9 82 39 39 38

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme10.png

18 rue de Mogador 75009 Paris FRANCE

Description : Description : C:\Users\Sylvain Leroux\AppData\Roaming\Microsoft\Signatures\NextIS_fichiers\forme9.png

www.nextis-consulting.com

Reply all
Reply to author
Forward
0 new messages