What are Dynamic Arrays?
Dynamic Arrays is a rather new feature in Excel where a function in one cell can output results into multiple cells. If those cells are already occupied the original cell is filled with a #Spill error.
How do the work with OpenPyXl?
Short answer is they don’t. They require new functionality not yet present in the library. If a dynamic array is present in a workbook that is loaded and saved it will be converted to a legacy array formula. I have seen multiple questions on StackOverflow and in this forum of people that do not understand why they would not work. And ask why the formula they entered is now prefixed with @ or surrounded by {} and no longer work unless manually triggered.
Why am I writing this?
I am sharing my findings in case someone else want to investigate this and want to get a head start. Perhaps if I get overwhelming support for this, I might try to contribute to the project to add some rudimentary Dynamic Array support.
What is needed for OpenPyXL to support Dynamic Arrays?
I took it upon myself to investigate the difference between an Excel-file where dynamic array formulas work, and one read and written by OpenPyXL here are my findings:
1:
In the _rels/workbook.xml.rel a relationship must be added that references the metadata specification:
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sheetMetadata" Target="metadata.xml"/>
See here for further specs:
2:
The file metadata.xml must be added with futureMetadata XLDAPR property.
3:
The relationships namespace must be added to the file workbook.xml
4:
The relationships namespace must be added to each sheet with dynamic arrays.
5:
Each cell containing a dynamic array function must have the “cm” property set to “1” on its column node: <c r="B1" cm="1"> <f t="array" ref="B1:B2">_xlfn.UNIQUE(A1:A3)</f> <v>1</v> </c>
I have successfully “restored” several worksheets with the steps above. Since my use case for OpenPyXL is basically to write data into tables and named cells, I am planning to use the metadata of my original file, combine it with the worksheet XMLs and used an XML-parser to add 4 and 5 to each sheet.
My understanding is that 1-4 are basically static files / edits just telling excel this feature is enabled, while the last one is a flag on each affected cell.
Note that the above only applies to dynamic arrays that populate downwards! Some dynamic array formulas can expand in both directions, and then and additional relationship called calcChain needs to be added. This is not a static file, and probably will be VERY hard to implement.
With all this said, I fully understand the goal of this project is to be compatible with Excel 2010 and this feature is introduced in later versions. But I though I would share my findings anyway, they might help or interest someone out there.
Best regards
Harald