Invalid Excel *.xlsm file when editing a workbook containing an external sheet with comments & formatting

340 views
Skip to first unread message

Philipp Freyer

unread,
Nov 6, 2015, 5:08:50 PM11/6/15
to openpyxl-users
Hi all,
as promised in the previous thread, here the thread for another problem, I stumbled upon:
Scenario:
I need to update information in a lot of Excel worksheets. They are created with Excel 2007.
Editing the data works fine but I have problems if someone using these sheets inserts another worksheet into them by copying them from another workbook with Excel by right-clicking on their tab in Excel and selecting the "move or copy" option for copying the worksheet.

I am using the latest openpyxl library from PyPI (2.3.0) and Python 2.7.8 32 bit on Windows 7 (64 bit).

Here a short example script that already shows the problem:
# coding=utf-8
import openpyxl

update_sheet = u"TESTBOOK_original.xlsm"
wb = openpyxl.load_workbook(update_sheet, keep_vba=True)
wb._external_links = []
wb.worksheets[0]["A1"].comment = Comment("ab", "cd")
wb.save(update_sheet)

The script just opens and saves the file, removing the external links (reason: see previous thread) and including one comment.
When I open the resulting file afterwards, I get the following error:
Excel found unreadable content in 'TESTBOOK_broken.xlsm'. Do you want to recover the contents of this workbook? If you trust the source of this workbook, click Yes.
When I click "Yes", Excel repairs the file, giving me the following error log:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>error127960_01.xml</logFileName><summary>Errors were detected in file 'C:\Users\pfreyer\Documents\TESTBOOK_bad.xlsm'</summary><removedFeatures summary="Following is a list of removed features:"><removedFeature>Removed Feature: Data validation from /xl/worksheets/sheet5.xml part</removedFeature><removedFeature>Removed Feature: Data validation from /xl/worksheets/sheet6.xml part</removedFeature></removedFeatures><removedRecords summary="Following is a list of removed records:"><removedRecord>Removed Records: Comments from /xl/comments1.xml part (Comments)</removedRecord></removedRecords></recoveryLog>

I will happily provide all the information I can, if you need more information. If you classify this as a bug, I will also create an issue in Bitbucket.
What I did not notice before: Not only the comments, but also the data validation that I set are a problem for Excel. I saw in the documentation that it is not possible to read existing data validation, but should openpyxl at least be able to keep it? If not, is there a way to remove existing validation, so that I can just replace it?

Kind regards and thanks so far,
Philipp
TESTBOOKS_Comments.zip

Charlie Clark

unread,
Nov 7, 2015, 9:00:39 AM11/7/15
to openpyx...@googlegroups.com
Am .11.2015, 23:09 Uhr, schrieb Philipp Freyer <pif...@gmail.com>:

> Hi all,
> as promised in the previous thread, here the thread for another problem,
> I stumbled upon:

> Scenario:
> I need to update information in a lot of Excel worksheets. They are
> created with Excel 2007.

FWIW as Excel 2007 itself doesn't conform to the specification it's
possible that files that have been processed by openpyxl will *not* be
readable by by Excel 2007.

> Editing the data works fine but I have problems if someone using these
> sheets inserts another worksheet into them by copying them from another
> workbook with Excel by right-clicking on their tab in Excel and selecting
> the "move or copy" option for copying the worksheet.

That is not relevant in this case.

> The script just opens and saves the file, removing the external links
> (reason: see previous thread) and including one comment.
>
> When I open the resulting file afterwards, I get the following error:
>
> *Excel found unreadable content in 'TESTBOOK_broken.xlsm'. Do you want
> to recover the contents of this workbook? If you trust the source of
> this workbook, click Yes.*
>
> When I click "Yes", Excel repairs the file, giving me the following
> error log:

Wow! A log that isn't total shit for once. But, seeing as the file can be
viewed and validated in the Productivity Tool it's also easier to identify
the culprit.

> I will happily provide all the information I can, if you need more
> information. If you classify this as a bug, I will also create an issue
> in Bitbucket.

Yes, please do.

> What I did not notice before: Not only the comments, but also the data
> validation that I set are a problem for Excel. I saw in the
> documentation that it is not possible to read existing data validation,
> but should openpyxl at least be able to keep it? If not, is there a way
> to remove existing validation, so that I can just replace it?

Actually, that's no longer the case: we do preserve data validations but
it looks like I, er, added a feature in 2.3 in converting to
Serialisable-based code so the the containing XML tag is wrong.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Charlie Clark

unread,
Nov 7, 2015, 9:17:13 AM11/7/15
to openpyx...@googlegroups.com
Am .11.2015, 15:00 Uhr, schrieb Charlie Clark
<charli...@clark-consulting.eu>:

> Actually, that's no longer the case: we do preserve data validations but
> it looks like I, er, added a feature in 2.3 in converting to
> Serialisable-based code so the the containing XML tag is wrong.

Actually, it looks like this is only a problem in 2.4. I have no problem
with the file with the 2.3 branch.

Charlie Clark

unread,
Nov 9, 2015, 8:11:43 AM11/9/15
to openpyx...@googlegroups.com
Am .11.2015, 23:09 Uhr, schrieb Philipp Freyer <pif...@gmail.com>:

> What I did not notice before: Not only the comments, but also the data
> validation that I set are a problem for Excel. I saw in the
> documentation that it is not possible to read existing data validation,
> but should openpyxl at least be able to keep it? If not, is there a way
> to remove existing validation, so that I can just replace it?

Have you had a chance to work with a checkout to see if this does solve
your problem?

Philipp Freyer

unread,
Nov 9, 2015, 9:56:40 AM11/9/15
to openpyxl-users
Not yet, but it is planned to test it in the evening. I will come back to you with results.

Philipp Freyer

unread,
Nov 18, 2015, 5:01:43 AM11/18/15
to openpyxl-users
Hi Charlie,
finally I got to test it.
I tested with the 2.3 and 2.4 branch using the following pip checkouts:
(one at a time, of course).
Both do not work on my system with the provided test files. (I removed the wb.external_links=[] line in the script, though, which results in broken comment, but not broken data validation anymore).

Please tell me, if I can do anything to help you further with this issue, I should now again have more time that I can dedicate to the topic.

Kind regards,
Philipp
TESTBOOK_bad.2.3.xlsm
TESTBOOK_bad.2.4.xlsm

Philipp Freyer

unread,
Nov 18, 2015, 5:09:24 AM11/18/15
to openpyxl-users

Charlie Clark

unread,
Nov 18, 2015, 5:13:56 AM11/18/15
to openpyx...@googlegroups.com
Am .11.2015, 11:01 Uhr, schrieb Philipp Freyer <pif...@gmail.com>:

> Please tell me, if I can do anything to help you further with this
> issue, I
> should now again have more time that I can dedicate to the topic.

What's "not working"? Are you trying to read the files with Excel 2007?

Philipp Freyer

unread,
Nov 18, 2015, 9:21:33 AM11/18/15
to openpyxl-users
Sorry, I could have elaborated further.
I am reading the files with Excel 2007, the file still is broken on import. The import log is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>error139360_01.xml</logFileName><summary>Errors were detected in file 'C:\Users\pfreyer\Documents\TESTBOOK_bad.xlsm'</summary><removedRecords summary="Following is a list of removed records:"><removedRecord>Removed Records: Comments from /xl/comments1.xml part (Comments)</removedRecord></removedRecords></recoveryLog>
Kind regards,
Philipp

Charlie Clark

unread,
Nov 18, 2015, 11:44:46 AM11/18/15
to openpyx...@googlegroups.com
Am .11.2015, 15:21 Uhr, schrieb Philipp Freyer <pif...@gmail.com>:

> Sorry, I could have elaborated further.
> I am reading the files with Excel 2007, the file still is broken on
> import.

Right, I think I know what's causing the problem: the comment is being
attached to a cell that doesn't get serialised. I thought I'd fixed that
but obviously not. However, a quick test setting a value on that cell
doesn't seem to resolve the problem so I'll need to dig deeper. There are
also some odd differences in the differential styles in sheet6.xml
depending on whether a comment has been added to sheet1.xml. No idea why
that is the case.

Tests with 2.2 suggest that this has never actually worked as intended. Is
this correct?

Charlie Clark

unread,
Nov 19, 2015, 6:34:57 AM11/19/15
to openpyx...@googlegroups.com
Am .11.2015, 15:21 Uhr, schrieb Philipp Freyer <pif...@gmail.com>:

> Sorry, I could have elaborated further.
> I am reading the files with Excel 2007, the file still is broken on
> import.

As per the bug report: there is currently no way to have comments and VB
controls on the same worksheet. We might add this later but it will never
work with Excel 2007.

Charlie Clark

unread,
Nov 26, 2015, 10:13:11 AM11/26/15
to openpyx...@googlegroups.com
Am .11.2015, 12:34 Uhr, schrieb Charlie Clark
<charli...@clark-consulting.eu>:

> As per the bug report: there is currently no way to have comments and VB
> controls on the same worksheet. We might add this later but it will
> never work with Excel 2007.

Update on this: thanks to John Bovey's excellent work it should now
actually be possible to add comments to existing worksheets that already
have them, when preserving macros. It works with the files supplied but as
it depends on an abandoned specification there is no guarantee that it
will always work.

I've also looked at how Excel 2016 deals with comments in so-called strict
mode: it essentially tries three different approaches though none seems
the preferred. :-( NB. you cannot real such files with openpyxl yet.

I've submitted a bug report to the ECMA Working Group in the hope of
getting a normative implementation that would rely less on sticky tape and
string but, given that non even Microsoft seems very clear on the matter,
I'm not very optimistic of ever obtaining a satisfactory answer to this.
But on the good side, it is now (2.4 branch) possible to create text boxes.
Reply all
Reply to author
Forward
0 new messages