IndexError on load_workbook

1,096 views
Skip to first unread message

matthe...@gmail.com

unread,
Sep 27, 2016, 6:22:29 AM9/27/16
to openpyxl-users
Hi - I am getting sporadic errors when trying to load an existing workbook (my script needs to be able to open an xlsx file, manipulate it then resave it). I've had some success doing this but for some reason, I am now getting in IndexError whenever I try to load the file.

My code:

from openpyxl import load_workbook


wb
= load_workbook('/home/lemon/Documents/test_excel.xlsx')


The error:

/home/lemon/.virtualenvs/bicc_excel/bin/python /home/lemon/code/python/bicc_excel/openpyxl_test.py
Traceback (most recent call last):
 
File "/home/lemon/code/python/bicc_excel/openpyxl_test.py", line 3, in <module>
    wb
= load_workbook('/home/lemon/Documents/test_excel.xlsx')
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 201, in load_workbook
    apply_stylesheet
(archive, wb) # bind styles to workbook
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 164, in apply_stylesheet
    stylesheet
= Stylesheet.from_tree(node)
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 99, in from_tree
   
return  super(Stylesheet, cls).from_tree(node)
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 89, in from_tree
   
return cls(**attrib)
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 90, in __init__
   
self.named_styles =  self._merge_named_styles()
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/stylesheet.py", line 110, in _merge_named_styles
    xf
= self.cellStyleXfs[style.xfId]
 
File "/home/lemon/.virtualenvs/bicc_excel/lib/python3.5/site-packages/openpyxl/styles/cell_style.py", line 182, in __getitem__
   
return self.xf[idx]
IndexError: list index out of range

I'm running this on a Debian 8 machine. I have been using Libreoffice 5.2.1.2 to create the original file as I don't have access to MS Excel. Obviously, the original file is saved as xlsx with Libreoffice. As I said, I have been able to load the file, add data to the sheet and resave it, but more often that not, I am getting this error. Python 3.5.2.

If any one could help, I'd appreciate it. I'm on a bit of a deadline!

Thanks,

Charlie Clark

unread,
Sep 27, 2016, 6:25:07 AM9/27/16
to openpyx...@googlegroups.com
Am .09.2016, 12:22 Uhr, schrieb <matthe...@gmail.com>:

> If any one could help, I'd appreciate it. I'm on a bit of a deadline!

I think this could be related to how LibreOffice handles named styles. See
https://bitbucket.org/openpyxl/openpyxl/issues/647/xfid-mapping-broken-in-cellxfs

But I'd need a file to work with to be sure.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

matthe...@gmail.com

unread,
Sep 27, 2016, 6:32:22 AM9/27/16
to openpyxl-users
Crikey - thanks for the quick reply! Test file attached. In the meantime, I'll check the Bitbucket Issue. Thanks.
test_excel.xlsx

Charlie Clark

unread,
Sep 27, 2016, 8:48:24 AM9/27/16
to openpyx...@googlegroups.com
Am .09.2016, 12:32 Uhr, schrieb <matthe...@gmail.com>:

> Crikey - thanks for the quick reply! Test file attached. In the meantime,
> I'll check the Bitbucket Issue. Thanks.

Yes, it is the same problem and as far as I can understand, this is a bug
in LibreOffice. This is from the styles.xml:

<cellStyleXfs count="6">
<xf borderId="0" fillId="0" fontId="0" numFmtId="0"/>
<xf borderId="0" fillId="0" fontId="0" numFmtId="0"/>
<xf borderId="0" fillId="0" fontId="0" numFmtId="0"/>
<xf borderId="0" fillId="0" fontId="0" numFmtId="0"/>
<xf borderId="0" fillId="0" fontId="0" numFmtId="0"/>
<xf borderId="0" fillId="0" fontId="0" numFmtId="0"/>
</cellStyleXfs>
<cellXfs count="3">
<xf applyAlignment="1" borderId="0" fillId="0" fontId="0"
numFmtId="164" pivotButton="0" quotePrefix="0" xfId="0">
<alignment horizontal="general" vertical="bottom"/>
</xf>
<xf applyAlignment="1" borderId="0" fillId="0" fontId="0"
numFmtId="164" pivotButton="0" quotePrefix="0" xfId="0">
<alignment horizontal="general" vertical="bottom"/>
</xf>
<xf borderId="0" fillId="0" fontId="0" numFmtId="0" pivotButton="0"
quotePrefix="0" xfId="0"/>
</cellXfs>
<cellStyles count="6">
<cellStyle builtinId="0" name="Normal" xfId="0"/>
<cellStyle builtinId="3" name="Comma" xfId="15"/>
<cellStyle builtinId="6" name="Comma [0]" xfId="16"/>
<cellStyle builtinId="4" name="Currency" xfId="17"/>
<cellStyle builtinId="7" name="Currency [0]" xfId="18"/>
<cellStyle builtinId="5" name="Percent" xfId="19"/>
</cellStyles>

The cellStyle elements are the names of the cellStyleXfs and the xfId
should be a pointer to the relevant xf element. The specification says:

"""
Zero-based index referencing an xf record in the cellStyleXfs collection.
This is used to determine the formatting defined for this named cell style.
"""

But if we only have 6 cellStyleXfs, how can we have indices from 15:19?
Excel doesn't seem to like the file either. I do have a question about
this open with the OOXML Working Group but it does look increasingly like
a LibreOffice bug, and quite a serious one. However, as the XML is
formally correct it's impossible to detect the problem by trying to
validate the file.

matthe...@gmail.com

unread,
Sep 27, 2016, 9:11:29 AM9/27/16
to openpyxl-users
Thank you for looking into this. I'd be interested in where exactly to post this kind of bug for Libreoffice, or where you have posted your issue - do you have a URL? I'd be more than willing to support the bug report or submit my own.  In the meantime, I presume if I can get hold of an xlsx file born of Excel rather than LibreOffice, I'll be ok - as long as I don't touch it (resave it) with Libreoffice. Keep up the good work with the library - it's a great tool.

Charlie Clark

unread,
Sep 27, 2016, 10:09:35 AM9/27/16
to openpyx...@googlegroups.com
Am .09.2016, 15:11 Uhr, schrieb <matthe...@gmail.com>:

> Thank you for looking into this. I'd be interested in where exactly to
> post this kind of bug for Libreoffice, or where you have posted your
> issue - do you have a URL?

The OOXML issues are hosted as Word documents one live.com because I think
this is possibly the hardest way to deal with bugs!

https://onedrive.live.com/?id=C8BA0861DC5E4ADC%21105&cid=C8BA0861DC5E4ADC

This particular one relates DR-14-0009 filed in 2014! It's all a bit
sisyphian but it looks like some of my reports are going to lead to
changes in the documentation and possibly even to the specification. I've
just been running round the office whooping and fist-pumping!

> I'd be more than willing to support the bug report or
> submit my own. In the meantime, I presume if I can get hold of an xlsx
> file born of Excel rather than LibreOffice, I'll be ok - as long as I
> don't touch it (resave it) with Libreoffice.

Well it should be possible to manually cut the crap and zip the file up
again. Please file a bug with LibreOffice. And update the relevant issue
on Bitbucket which we can then close.

> Keep up the good work with the library - it's a great tool.

Donations are always gratefully accepted! A couple of people actually
bought me beer at PyCon UK :-)
fixed.xlsx

matthe...@gmail.com

unread,
Sep 27, 2016, 11:12:20 AM9/27/16
to openpyxl-users
The fact that they collect bug reports in Word documents on live.com reinforces why use Libreoffice rather than MS Office - even if Libre is a minefield of XML bugs, I will use it. Unbelievable.

I wouldn't know what I was cutting out of the zip file to clean it up and as I don't have much time to lose myself in XML - I'll just have to find a way of handling MS xlsx files only, which will be a pain but probably doable.

I'll post the same query (more or less) on your BB issue and look into a small donation - purely on the basis of you interfacing with live.com so I don't have to ;-)

Matt

Charlie Clark

unread,
Sep 27, 2016, 11:20:53 AM9/27/16
to openpyx...@googlegroups.com
Am .09.2016, 17:12 Uhr, schrieb <matthe...@gmail.com>:

> The fact that they collect bug reports in Word documents on live.com
> reinforces why use Libreoffice rather than MS Office - even if Libre is a
> minefield of XML bugs, I will use it. Unbelievable.

Yes, but it's a nice junket for those on the WG. To be honest, I have to
admit that everyone has been more than polite and the people working at
Microsoft often seem to struggle as much with the format as anyone else.
If MS were smart they'd drop OOXML and use ODF and concentrate on
producing the best tools – on MacOS at least MS Office is streets ahead of
LibreOffice in terms of usability, because companies would happily
continue to pay them. I'm currently struggling to buy a licence for the
2016 version.

> I wouldn't know what I was cutting out of the zip file to clean it up and
> as I don't have much time to lose myself in XML - I'll just have to find
> a way of handling MS xlsx files only, which will be a pain but probably
> doable.

The XML in the styles can be edited in memory. If I get bored I could
write the code. Though you can also use the methods in
styles/stylesheet.py to check. For added fun: duplicate names/styles are
common in which case we dedupe using the highest index.

> I'll post the same query (more or less) on your BB issue and look into a
> small donation - purely on the basis of you interfacing with live.com so
> I don't have to ;-)

It is a nightmare. Normally, however, I can get away with posting my
remarks by e-mail and someone else putting them into word.

matthe...@gmail.com

unread,
Sep 27, 2016, 11:23:31 AM9/27/16
to openpyxl-users
I wish you luck. Thanks again for your help.
Matt
Reply all
Reply to author
Forward
0 new messages