Excel file generated by OpenPyXL "needs repair" when opening in Excel

11,471 views
Skip to first unread message

Johan De Taeye

unread,
Nov 16, 2017, 4:43:37 AM11/16/17
to openpyxl-users

Hi,

The excel file we generate with OpenPyXl (currently using 2.4.8) sometimes gives an error when opening it in Excel 2013.
The error message excel prompts us with is:
    "We found a problem with some content in '<filename.xlsx>'. Do you want to recover as much as we can? If you trust the source of this workbook, click Yes"
The log file of excel's repair doesn't provide any insight on what's wrong.

The problem isn't fully reproducible. Sometimes it works, sometimes we get the error.
By comparing a invalid and valid excel file with the same data content I was able to pinpoint the problem to the following difference in the "[Content_Types].xml":
  • Good "[Content_Types].xml" file:
    ...
    <Default ContentType="application/xml" Extension="xml"/>
    ...

  • Buggy "[Content_Types].xml" file:
    ...
    <Default ContentType="application/xml" Extension="xml"/>
    <Default ContentType="text/xml" Extension="xml"/>
    ...
Can anyone shed some light on what could be the root cause of this occasional difference in the file?

Regards,

Johan

Charlie Clark

unread,
Nov 16, 2017, 4:51:11 AM11/16/17
to openpyx...@googlegroups.com
Am .11.2017, 10:43 Uhr, schrieb Johan De Taeye <johan.d...@gmail.com>:

> Can anyone shed some light on what could be the root cause of this
> occasional difference in the file?

Not really without the file. I'd be surprised if it was just the manifest.
openpyxl uses the correct mime type for xml which is application/xml so I
don't know where the text/xml is coming from.

When Excel complains it usually offers to create log with the error
report. These usually aren't that useful but can at least provide an
indication as to which component is causing the problems.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Johan De Taeye

unread,
Nov 16, 2017, 5:33:25 AM11/16/17
to openpyx...@googlegroups.com

Hi Charlie,

Attached are the excel and the (uninteresting) log file excel generates. I also tried Microsoft Open XML SDK validator tool to pinpoint the issue - it's message is as useless as the log file.

The python code generating the excel is open source: see the exportWorkBook function at https://github.com/frePPLe/frepple/blob/9c6710e4405673ce8ff7e3ac286361ae412ebdd5/freppledb/common/report.py#L2206
I have the sneaking suspicion that the named style and/or the autofilter are somehow related to the problem.

Despite this problem, we are long time users of openpyxl and very happy about it.
A big thanks for all the effort you put into it.

Regards,

Johan
--
You received this message because you are subscribed to a topic in the Google Groups "openpyxl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpyxl-users/2-T90R8ZdVE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpyxl-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
frepple-34.xlsx
error014640_01.xml

Charlie Clark

unread,
Nov 16, 2017, 6:13:37 AM11/16/17
to openpyx...@googlegroups.com
Am .11.2017, 11:33 Uhr, schrieb Johan De Taeye <johan.d...@gmail.com>:

>
> Hi Charlie,
>
> Attached are the excel and the (uninteresting) log file excel generates.
> I also tried Microsoft Open XML SDK validator tool to pinpoint the issue
> - it's message is as useless as the log file.
>
> The python code generating the excel is open source: see the
> exportWorkBook function at
> https://github.com/frePPLe/frepple/blob/9c6710e4405673ce8ff7e3ac286361ae412ebdd5/freppledb/common/report.py#L2206
> I have the sneaking suspicion that the named style and/or the autofilter
> are somehow related to the problem.
>
> Despite this problem, we are long time users of openpyxl and very happy
> about it.
> A big thanks for all the effort you put into it.

I'll need the original file as well and the any code you run on it.

Johan De Taeye

unread,
Nov 16, 2017, 6:29:29 AM11/16/17
to openpyx...@googlegroups.com

Hi Charlie,

>>I'll need the original file as well and the any code you run on it.

I'm not sure what you refer to. We generate the complete file from scratch with the python code in the function I pointed out.

The difference between a correct and wrong file is really only this content-types.xml file. By editing it in a failing file I can manually repair it and let it open without issues in Excel.
I don't know the openpyxl code in depth to understand what can create the occasional extra extension.

Regards,

Johan

-----Original Message-----
From: openpyx...@googlegroups.com [mailto:openpyx...@googlegroups.com] On Behalf Of Charlie Clark
Sent: Thursday, November 16, 2017 12:14 PM
To: openpyx...@googlegroups.com
Subject: Re: [openpyxl-users] Excel file generated by OpenPyXL "needs repair" when opening in Excel

Charlie Clark

unread,
Nov 16, 2017, 6:33:51 AM11/16/17
to openpyx...@googlegroups.com
Am .11.2017, 12:29 Uhr, schrieb Johan De Taeye <johan.d...@gmail.com>:

> I'm not sure what you refer to. We generate the complete file from
> scratch with the python code in the function I pointed out.

Ah, okay. That wasn't clear. Don't have time to debug this at the moment.

> The difference between a correct and wrong file is really only this
> content-types.xml file. By editing it in a failing file I can manually
> repair it and let it open without issues in Excel.
> I don't know the openpyxl code in depth to understand what can create
> the occasional extra extension.

It's probably the manifest which is handled by the packaging code.
Something is being added with a different mime-type.

Johan De Taeye

unread,
Nov 16, 2017, 7:45:41 AM11/16/17
to openpyx...@googlegroups.com

Hi Charlie,

After a quick look at the code in manifest.py, could it be that the duplicate comes from my Python installation?
In my python setup it associates "text/xml" as mimetype for ".xml" files. I'm running on windows, but also on Ubuntu I'm getting the same result.

>python
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import mimetypes
>>> mimetypes.types_map['.xml']
'text/xml'

This mime-type could conflict with the one you define in line 42:
mimetypes.add_type('application/xml', ".xml")

Still it beats me why the problem isn't reproducible, or why I would be the first to run into this.

Regards,

Johan

-----Original Message-----
From: openpyx...@googlegroups.com [mailto:openpyx...@googlegroups.com] On Behalf Of Charlie Clark
Sent: Thursday, November 16, 2017 12:34 PM
To: openpyx...@googlegroups.com
Subject: Re: [openpyxl-users] Excel file generated by OpenPyXL "needs repair" when opening in Excel

Charlie Clark

unread,
Nov 16, 2017, 9:11:15 AM11/16/17
to openpyx...@googlegroups.com
Am .11.2017, 13:45 Uhr, schrieb Johan De Taeye <johan.d...@gmail.com>:

> In my python setup it associates "text/xml" as mimetype for ".xml"
> files. I'm running on windows, but also on Ubuntu I'm getting the same
> result.

Shouldn't matter but you should see errors if you run the tests.
mime-types is only really relevant for binary types like images.

Johan De Taeye

unread,
Nov 19, 2017, 4:46:36 PM11/19/17
to openpyxl-users

Charlie,

This is how I fixed the problem.


The function openpyxl/packaging looks as follows:

    def _register_mimetypes(self, filenames):
        """
        Make sure that the mime type for all file extensions is registered
        """
        for fn in filenames:
            ext = os.path.splitext(fn)[-1]
            if not ext:
                continue
            mime = mimetypes.types_map[ext]
            fe = FileExtension(ext[1:], mime)
            self.Default.append(fe)

This contains no protection against duplicated extensions in the default list.
In my case the duplicate is created when 'FileExtension("xml", "text/xml")' and 'FileExtension("xml", "application/xml")' are both added to the list.

My modified version of the function avoids such duplicates:

    def _register_mimetypes(self, filenames):
        """
        Make sure that the mime type for all file extensions is registered
        """
        defaults = [t.Extension for t in self.Default]           # <<<  NEW
        for fn in filenames:
            ext = os.path.splitext(fn)[-1]
            if not ext:
                continue
            mime = mimetypes.types_map[ext]                        
            if ext[1:] not in defaults:                                 # <<<  NEW
              fe = FileExtension(ext[1:], mime)                # <<<  NEW
              self.Default.append(fe)                               # <<<  NEW
              defaults.append(ext[1:])                             # <<<  NEW

Is this a fix that makes sense to you and can be added to the standard openpyxl?

Regards,

Johan



Op donderdag 16 november 2017 15:11:15 UTC+1 schreef Charlie Clark:

Charlie Clark

unread,
Nov 20, 2017, 5:17:43 AM11/20/17
to openpyx...@googlegroups.com
Am .11.2017, 22:46 Uhr, schrieb Johan De Taeye <johan.d...@gmail.com>:

> Is this a fix that makes sense to you and can be added to the standard
> openpyxl?

Not really. You need to find out what in your code is causing openpyxl to
add the erroneous mime-type and work from there. Then you can create a
test case, write a fix and submit a pull request.

Johan De Taeye

unread,
Dec 11, 2017, 6:19:46 AM12/11/17
to openpyxl-users

Finally tracked the culprit down.

Still feels a bit tricky and messy to update the mime-map globally.  The standard cpython library does it safer&cleaner&robust by creating a copy of the map:


Johan

Op maandag 20 november 2017 11:17:43 UTC+1 schreef Charlie Clark:
Reply all
Reply to author
Forward
0 new messages