access calculated value without resorting to data_only

25 views
Skip to first unread message

Kyler Laird

unread,
May 22, 2025, 10:10:37 AMMay 22
to openpyxl-users
My work has evolved into mostly using OpenPyxl (and Pyodide) to read and write Excel files.  I write wrapper classes to add common functionality (like handling named columns).  I'm now writing some functionality that seems ridiculously inefficient.

If I read a file with data_only=True, I'll destroy all of the formulas that were there when I write it.  That's not an option.

If I read a file with data_only=False, I don't have access to the calculated value of a cell.  If I need those values, I must open another copy of the workbook just to access that data.

I poked around parse_cell() and found that it's fairly straightforward to always parse formulas and always handle the calculated value, making both available for every cell.  This is so much cleaner than having two copies of the workbook.

Has storing the calculated value for data_only=False been considered?  Ideally, I'd like to have something like cell.calculated_value (in addition to cell.value).

--kyler

Charlie Clark

unread,
May 22, 2025, 2:10:44 PMMay 22
to openpyxl-users

On 22 May 2025, at 16:10, Kyler Laird wrote:

Hi Kyler,

My work has evolved into mostly using OpenPyxl (and Pyodide) to read and
write Excel files. I write wrapper classes to add common functionality
(like handling named columns). I'm now writing some functionality that
seems ridiculously inefficient.

I think the emphasis is on seems.

If I read a file with data_only=True, I'll destroy all of the formulas that
were there when I write it. That's not an option.

If I read a file with data_only=False, I don't have access to the
calculated value of a cell. If I need those values, I must open another
copy of the workbook just to access that data.

I poked around parse_cell() and found that it's fairly straightforward to
always parse formulas and always handle the calculated value, making both
available for every cell. This is so much cleaner than having two copies
of the workbook.

Has storing the calculated value for data_only=False been considered?
Ideally, I'd like to have something like cell.calculated_value (in addition
to cell.value).

Yes, this has been considered. There are two reasons why it hasn't been, and won't be added. Firstly, the implementation avoids any ambiguity about what the "value" of a cell is, and forces users to consider this only when it matters, which it usually doesn't. This means avoiding code that checks for formulae, but still has to cope with the fact that no value may ever have been calculated. And, of course, you have the problem of cache invalidation.

Secondly, most cells will never have a cached value, but adding the attribute to all cells will increase memory consumption for all cells, which is possibly the biggest performance challenge in Openpyxl.

In practice, I suggest running one or even both workbooks in read-only mode, depending upon your needs. This is very efficient and has always been sufficient where both values are required.

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

Reply all
Reply to author
Forward
0 new messages