Hi,
I've been trying to use openpyxl to insert data into an XLSX file containing images and drawings. I'm hoping to be able to open the file, modify the cell data and save it again, preserving as much of the original file as possible. I understand that the DrawingML support in openpyxl is still under development, and so losing the drawings is not a problem, but I would like the images to be preserved.
If I create the file using LibreOffice, this is indeed what happens: the images are copied across, but the drawings are lost.
However, if I create the file using Excel, then neither the images nor the drawing are preserved. Removing the drawings means that the images are preserved.
Extracting the file created by Excel, I found the following lines in the xl/drawings/drawing1.xml:
<a:custGeom>
<a:avLst/>
<a:gdLst/>
<a:ahLst/>
<a:rect l="l" t="t" r="r" b="b"/>
<a:pathLst>
<a:path w="21600" h="21600">
<a:lnTo>
<a:pt x="21600" y="21600"/>
</a:lnTo>
</a:path>
</a:pathLst>
</a:custGeom>
An exception was raised when using openpyxl to parse the coordinates for the a:rect element. It seems that the attributes l, r, t and b are expected to be integers:
Traceback (most recent call last):
File "/home/pbanks/src/openpyxl-testing/venv/lib/python3.7/site-packages/openpyxl/descriptors/base.py", line 57, in _convert
value = expected_type(value)
ValueError: invalid literal for int() with base 10: 'l'
Removing the a:rect line from the XML file allows openpyxl to parse the file correctly.
I looked into the Office Open XML definition for the a:rect element [1, pp 2922, section 20.1.9.22]. Each of of these attributes are expected to be of the type ST_AdjCoordinate (defined in [1, pp 2924, section 20.1.10.2) which is defined to be the union of the types ST_Coordinate and ST_GeomGuideName. I don't fully understand the definitions in the document, but the schema says that the latter of these can be any token, and so it appears that this XML file is in fact valid, and that this is a bug with the parsing code of openpyxl.
My knowledge of DrawingML is limited, so please excuse me if my reading of the ECMA documentation is wrong. If someone can confirm that this is an issue, I'm happy to file a bug report about it.
Let me know if you need any more information.
Best wishes,
Peter
PS: attached are XLSX files with the drawing which causes the issues, and also a copy of the file with the offending a:rect line removed, along with a short program that demonstrates the issue. Also attached is a traceback generated by adding a call to traceback.print_exc in openpyxl/reader/drawings.py on the line before the warning about DrawingML support.