Merged Cells and Styles (again...)

3,790 views
Skip to first unread message

Ken Seehart

unread,
Oct 24, 2016, 10:58:48 PM10/24/16
to openpyxl-users
Having trouble with borders around merged cells?

I want to revisit this "dead horse" again.

https://bitbucket.org/openpyxl/openpyxl/issues/365/styling-merged-cells-isnt-working

Actually, a better description of the issue can be found in the 9 bugs that are marked as duplicates. Most of these note that styling of merged cells is lost upon loading, and in these cases, the official workaround doesn't help.

Basically, if you load and then save a spreadsheet containing merged cells with borders, the border gets (mostly) destroyed. Ideally, Load+Save should not be lossy.

There's a simple fix for the underlying problem (credit to Sergey Pikhovkin for finding it). I've submitted a pull request for this. However, since the bug is considered resolved, and the conversation has a "dead horse" quality to it, I've decided to wrap the fix as a monkey-patch for my own needs, and having done that, I want to make it available for anyone in the community. Please let me know if this works for you, or if you encounter any problems with it.

Meanwhile I'm working on an improved version that mimics the behavior of Excel in more cases. But ideally the minimum constraint for any solution should be to maintain Load+Save = not lossy (assuming there is an intent to support both reading and writing of the feature). I admit I may be a bit biased here because my application involves modifying an existing spreadsheet (as opposed to creating a new spreadsheet from scratch).

from itertools import product
import types
import openpyxl
from openpyxl import worksheet
from openpyxl.utils import range_boundaries


def patch_worksheet():
    """This monkeypatches Worksheet.merge_cells to remove cell deletion bug
    https://bitbucket.org/openpyxl/openpyxl/issues/365/styling-merged-cells-isnt-working
    Thank you to Sergey Pikhovkin for the fix
    """

    def merge_cells(self, range_string=None, start_row=None, start_column=None, end_row=None, end_column=None):
        """ Set merge on a cell range.  Range is a cell range (e.g. A1:E1)
        This is monkeypatched to remove cell deletion bug
        https://bitbucket.org/openpyxl/openpyxl/issues/365/styling-merged-cells-isnt-working
        """
        if not range_string and not all((start_row, start_column, end_row, end_column)):
            msg = "You have to provide a value either for 'coordinate' or for\
            'start_row', 'start_column', 'end_row' *and* 'end_column'"
            raise ValueError(msg)
        elif not range_string:
            range_string = '%s%s:%s%s' % (get_column_letter(start_column),
                                          start_row,
                                          get_column_letter(end_column),
                                          end_row)
        elif ":" not in range_string:
            if COORD_RE.match(range_string):
                return  # Single cell, do nothing
            raise ValueError("Range must be a cell range (e.g. A1:E1)")
        else:
            range_string = range_string.replace('$', '')

        if range_string not in self._merged_cells:
            self._merged_cells.append(range_string)


        # The following is removed by this monkeypatch:

        # min_col, min_row, max_col, max_row = range_boundaries(range_string)
        # rows = range(min_row, max_row+1)
        # cols = range(min_col, max_col+1)
        # cells = product(rows, cols)

        # all but the top-left cell are removed
        #for c in islice(cells, 1, None):
            #if c in self._cells:
                #del self._cells[c]

    # Apply monkey patch
    m = types.MethodType(merge_cells, None, worksheet.Worksheet)
    worksheet.Worksheet.merge_cells = m



patch_worksheet()

Ken Seehart

unread,
Oct 24, 2016, 11:16:29 PM10/24/16
to openpyxl-users

I should note that Charlie points out another reasonable constraint, which is removal of ambiguity. The model in the specification is ambiguous, and apparently the only way to resolve the ambiguity is to delete the cells. For my situation "load+save=not lossy" trumps the ambiguity concern. If this is not true for you, it might be best not to use my monkey-patch.

Ken Seehart

unread,
Oct 25, 2016, 2:58:48 AM10/25/16
to openpyxl-users
Results of more experiments:

The model implemented MS Excel allows for all of the merged cells to have values. So the notion that the merged cells "don't exist" is not correct. Of course the user interface does not provide any access to these cells, but the cells are present in the model nevertheless.

My experiment was to use openpyxl to set values in the hidden cells (having removed the code that deletes the offending cells, as above). Then I saved the file, opened it in Excel, and unmerged the merged cells. Guess what? The values were right there!

Conclusion: Excel does not delete cells, but merely hides them. I did additional experiments, leading to the following conclusions:

The merge operation does not merge multiple cells into a single cell. Instead, it hides cell content in order to create the illusion that the cells are merged.

When you merge cells, Excel does the following (not necessarily in any particular order):
  • clears the values of all but the upper-left
  • copies non-border formatting to all of the cells
  • copies border formatting according to the exposed sides.
  • clears all hidden borders.
  • merges the cells
When you unmerge cells, Excel does the following:
  • unmerges the cells (and nothing else)
Rules for display:
  • Excel displays the value and non-border style and diagonal borders of the upper left cell.
  • Excel displays whatever non-diagonal border content is present in the exposed part of cells (the edges of cells that coincide with the edges of the merged region).
    • This is true even if the border content is inconsistent, e.g. the top border might be different for each of the top cells, and all of these borders are displayed as is!
  • With the exception of diagonals, the borders in the interior of a merged region are hidden (displayed as no border, even if a border is present in the hidden cells).
All of the above pertains to the MS Excel implementation, and is not necessarily in the specification.

The key insight here is that, whatever else may be true, deleting cells is definitely contrary to the model. To the extent that the specification is ambiguous, I think it's reasonable to match what we know of the model implemented by MS Excel, and let the programmer do what they like outside of the spec, without being overly restrictive.

So I recommend that merge_cells() implement the description above. So my monkey-patch is just an incomplete improvement over the current situation (i.e. deleting cells can't be right), and not really the ultimate solution.


Ken Seehart

unread,
Oct 25, 2016, 3:11:31 AM10/25/16
to openpyxl-users
When loading, I would not suggest doing all that, but rather do what is in the monkey-patch (i.e. the original code minus the cell deletion code).

If the programmer calls merge_cells() it should do the cell clearing logic describe in the above post.

clear parameter, defaulting to True could control this. But the loader would use clear=False, because I think all other things being equal, I want to maintain the principle that load+save = not lossy, whenever possible.


Charlie Clark

unread,
Oct 25, 2016, 10:57:20 AM10/25/16
to openpyx...@googlegroups.com
Am .10.2016, 08:58 Uhr, schrieb Ken Seehart <kense...@gmail.com>:

> All of the above pertains to the MS Excel implementation, and is not
> necessarily in the specification.

And this is a real problem, and why I am still waiting for clarification
from Microsoft and the OOXML Working Group. Depending on what comes out of
that I suspect that a different implementation for merged cells would make
sense. The current code deliberately does only that which is necessary to
match the specification. That some of the nonsense that you discovered is
possible with the Excel hack is indicative of how important it is to
clarify what should happen.

My preference would be for geometric ranges which would allow for
membership and intersection tests and formatting, together perhaps with a
kind of read-only cell for members of a range. It's possible that
Laurent's recent suggestion could be the basis for this. It's not that I
wouldn't like roundtrip support, but the current reverse engineered,
"works for me" approach is insufficient. And the way Excel currently
(could change at any time because not covered by the specification)
implements formatting for individual cells is verbosely insane.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Ken Seehart

unread,
Oct 25, 2016, 6:47:28 PM10/25/16
to openpyxl-users
I'm aiming for more than just "works for me". Round trip support is not just important for it's own sake, it is evidence that the model is being handled consistently (or at least the converse holds: lack of round trip support indicates that the model is being abused). If round trip support fails, you can be sure there will be other problems too.
 
"The current code deliberately does only that which is necessary to match the specification."

I don't see how the cell deletions help to match the specification. On the other hand, they clearly take us away from the reference implementation.

All other things being equal, the reference implementation is worth matching in the absence of a complete specification.

That some of the nonsense that you discovered is   
possible with the Excel hack is indicative of how important it is to   
clarify what should happen. 

I don't see it as nonsense. There are two reasons that something can appear to be nonsense (or verbosely insane):
  1. because it is actually nonsense
  2. because it does not match our current set of assumptions (e.g. quantum mechanics looks a lot like nonsense)
In this case it is the latter. Specifically the incorrect assumption I refer to is this:

openpyxl does not remove cells that have already been merged: they don't exist. 2015-01-23

The evidence says otherwise, and if you let go of this incorrect idea, everything will make a lot more sense. Merging is about hiding cells, not deleting cells. In the reference implementation, the cells exist. I do not think there is any justification for saying that the underlying cells in a merged range don't exist. Can you explain it?

My experiments do not reveal weird hacks, they just reveal a model that differs from the idea that merging is about replacing a group of cells with a single large cell. In fact none of these results were surprising to me at all. The reason they make sense to me is that the model in my head is consistent with the reference implementation.

These experiments reveal empirical evidence that the reference implementation implements a model based on hiding cells rather than deleting them.

I have a reasonable test of desired functionality that fails on the current version and passes with the patch (load+save a simple example of a formatted merged range of cells with borders). Do you have a test that passes on the current version and fails with the patch?

I hold that compatibility with the reference implementation is of more practical relevance than theoretical compatibility with an unspecified portion of an official specification that may never arrive (we've been waiting almost two years).

I will test with Open Office Calc and share the results here. I'm confident that it will pass, and that OOC has similar behavior to XL in terms of the model.

This idea of preventing writing to the underlying merged cells may have some merit, but I don't see it as very important. On the one hand, the idea seems completely justifiable in the context of the incorrect idea that the underlying merged cells "don't exit". However, if we consider the underlying cells to be hidden rather than nonexistent, then access to these cells seems more a matter of business logic than model. If the underlying cells exist in the model, then I think it's okay to expose them, with the caveat that certain access patterns on merged cells have undefined results. This kind of caveat is ubiquitous and mostly harmless. So, I'd say keep is simple, keep the formatting in the underlying cells as in the reference implementation. No additional structures are needed.

Two new enhancements would make this more complete:
  1. Add this to the merge_cells function:
    • clears the values of all but the upper-left
    • copies non-border formatting to all of the cells
    • copies border formatting according to the exposed sides.
    • clears all hidden borders.
    • merges the cells
  1. Logic for intersection (as you mention)
With this approach, using the reference implementation as a guide, I'm pretty sure all of the bugs and their duplicates relating to formatting of merged cells will be actually resolved. I think it's worth doing. I'll create a fork for this, and maybe if you come around to my way of thinking, it will find it's way into the main trunk.

Ken Seehart

unread,
Oct 25, 2016, 7:32:22 PM10/25/16
to openpyxl-users
I think I talk too much....

I am very grateful for this awesome project and I want to give something back.

Do you have any objection to me branching a version (which you can merge if you like the results) with the following properties?
  • Full round-trip support of merged cells (including border styles)
    • If my code doesn't operate on merged cells in the document, they will remain intact when loaded and saved.
  • Revised merge_cells function that mimics cell merge operation in Excel
    • Caveat: operations such as overlapping merges that the Excel UI prevents will have undefined results
    • Caveat: modifying styles in underlying merged cells will have undefined results that are consistent with the reference implementation

Along with that I can contribute some other stuff I've been doing:
  • A more complete Copy Worksheet that does a lot more than just cells
  • Maybe inserting/deleting rows/columns (conserving things like cell references, groups, merges, formulas, etc)
    • This one is in "works for me" state, so it might be a while to get it production ready.

Charlie Clark

unread,
Oct 26, 2016, 5:02:38 AM10/26/16
to openpyx...@googlegroups.com
Am .10.2016, 01:32 Uhr, schrieb Ken Seehart <kense...@gmail.com>:

> I think I talk too much....

Possibly. I think there probably isn't sufficient discussion on features,
functions and implementation on the mailing list. But on this issue, I
think we'll have to agree to disagree.

> I am very grateful for this awesome project and I want to give something
> back.
>
> Do you have any objection to me branching a version (which you can merge
> if you like the results) with the following properties?

The beauty of open source is that you can fork without permission. I do
not anticipate merging this code into the project.

> - *Full round-trip support of merged cells *(including border styles)
> - If my code doesn't operate on merged cells in the document, they
> will remain intact when loaded and saved.
> - *Revised merge_cells function that mimics cell merge operation in
> Excel*
> - Caveat: operations such as overlapping merges that the Excel UI
> prevents will have undefined results
> - Caveat: modifying styles in underlying merged cells will have
> undefined results that are consistent with the reference
> implementation

Please avoid the term "reference implementation". While it's fine to
orient expectations around Excel as by far the most common OOXML client it
is in *no* way a reference implementation.

> Along with that I can contribute some other stuff I've been doing:
>
> - A more complete *Copy Worksheet *that does a lot more than just
> cells
> - Maybe* inserting/deleting rows/columns *(conserving things like cell
> references, groups, merges, formulas, etc)
> - This one is in "works for me" state, so it might be a while to get
> it production ready.

I'd suggest maintaining this kind of stuff in a separate branch and
working feature by feature. I do plan to add aggregate
(insert/delete/move) at some point but only *after* changing the
underlying data model. The tokeniser should be able to handle cell
references but I'd be happy to leave this up to clients (there are lots
and lots things that might be referencing cells). At the moment you need a
lot of code that is tied to internal implementation and this will become
useless.

jacquesp...@gmail.com

unread,
Mar 28, 2017, 10:43:10 AM3/28/17
to openpyxl-users
Good day Ken

Will simply copying and pasting this code into a python script work to correct the openpyxl behavior? I have tried and it results in the following error:

" line 54, in patch_worksheet
    m = types.MethodType(merge_cells, None, worksheet.Worksheet)
TypeError: method expected 2 arguments, got 3
"

I'm new to python and programming in general but I have found openpyxl to be amazingly useful, until I encountered this behavior of corrupting the formatting of merged cells. All apologies if the question is retarded.

Kind Regards
Jacques

henrique.a...@gmail.com

unread,
Jun 5, 2017, 12:50:52 AM6/5/17
to openpyxl-users, jacquesp...@gmail.com
Great! Here works good!
I load my Excel file and write as a new file and the merged cells are ok.

The steps:

1 - Copy the code and save in a new file, like "fix_border.py"
2 - Import this file into your code, like "from fiz_border import patch_worksheet"
3 - Before the line that opens the Excel file, call "patch_worksheet()"

kingwo...@gmail.com

unread,
Jun 12, 2017, 2:33:14 PM6/12/17
to openpyxl-users, jacquesp...@gmail.com
I am also getting the same error when calling the function as described in the message by henrique.

todd.l...@gmail.com

unread,
Jul 21, 2017, 8:50:20 PM7/21/17
to openpyxl-users, jacquesp...@gmail.com, kingwo...@gmail.com
Did you ever find a solution for the error, I'm also new to python. 

m = types.MethodType(merge_cells, None, worksheet.Worksheet)
TypeError: method expected 2 arguments, got 3
"

todd.l...@gmail.com

unread,
Jul 23, 2017, 2:10:49 PM7/23/17
to openpyxl-users, jacquesp...@gmail.com, kingwo...@gmail.com, todd.l...@gmail.com
Solved my issue, If you are running python 3 you need to replace:

m = types.MethodType(merge_cells, worksheet.Worksheet)
worksheet.Worksheet.merge_cells = m

with:

setattr(worksheet.Worksheet, 'merge_cells', merge_cells)

eshter...@gmail.com

unread,
May 7, 2018, 5:31:48 AM5/7/18
to openpyxl-users

I met an error AttributeError: type object 'Worksheet' has no attribute '_merged_cells'  I DONOT KNOW how to solve this
在 2016年10月25日星期二 UTC+8上午10:58:48,Ken Seehart写道:
Message has been deleted

wannim...@gmail.com

unread,
May 9, 2018, 4:19:32 PM5/9/18
to openpyxl-users
This worked for me: 


        if range_string not in self.merged_cells:
self.merged_cells.add(range_string)



# The following is removed by this monkeypatch:

# min_col, min_row, max_col, max_row = range_boundaries(range_string)
# rows = range(min_row, max_row+1)
# cols = range(min_col, max_col+1)
# cells = product(rows, cols)

# all but the top-left cell are removed
#for c in islice(cells, 1, None):
#if c in self._cells:
#del self._cells[c]

# Apply monkey patch
    worksheet.Worksheet.merge_cells = merge_cells

patch_worksheet()

Reply all
Reply to author
Forward
0 new messages