Many cells fail to read in read-only mode only.

566 views
Skip to first unread message

kyler...@gmail.com

unread,
Jun 22, 2018, 2:50:20 PM6/22/18
to openpyxl-users
I'm extracting data from a file that's fairly large (22 MB) and needs to be processed quickly.  Read-only mode has been great for this.

Unfortunately, reading many (but not all) of the cells with formulas fails in read-only mode.  It works well in read-write mode.  Also, if I read the file and then write it, I can read the resulting file in read-only mode without problems.

It would be great if I could just re-write it once and be done but this file is generated and maintained by others.  It's also confidential data so I can't share the file.

I'd appreciate advice on where to look to debug the code.

Thank you.

--kyler

Here's my code:

import openpyxl

master_wb = openpyxl.load_workbook(filename='soeasa.xlsx', read_only=True)
#master_wb = openpyxl.load_workbook(filename='fixed.xlsx', read_only=True)
master_ws = master_wb['Master1']

for row in master_ws:
    print(row[6], row[6].value, row[6].data_type)


This is the tail end of the output in read-only mode:
(<ReadOnlyCell 'Master1'.G1020>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1021>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1022>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1023>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1024>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1025>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1026>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1027>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1028>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1029>, '=PROPER(TRIM(E1029))&","&PROPER(TRIM(F1029))', 'f')
(<ReadOnlyCell 'Master1'.G1030>, '=PROPER(TRIM(E1030))&","&PROPER(TRIM(F1030))', 'f')
(<ReadOnlyCell 'Master1'.G1031>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1032>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1033>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1034>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1035>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1036>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1037>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1038>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1039>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1040>, '=', 'f')
(<ReadOnlyCell 'Master1'.G1041>, None, 'n')
(<ReadOnlyCell 'Master1'.G1042>, '=PROPER(TRIM(E1042))&","&PROPER(TRIM(F1042))', 'f')
(<ReadOnlyCell 'Master1'.G1043>, '=PROPER(TRIM(E1043))&","&PROPER(TRIM(F1043))', 'f')


Here's the tail in read-write mode (after a long delay...):

(<Cell u'Master1'.G1020>, '=PROPER(TRIM(E1020))&","&PROPER(TRIM(F1020))', 'f')
(<Cell u'Master1'.G1021>, '=PROPER(TRIM(E1021))&","&PROPER(TRIM(F1021))', 'f')
(<Cell u'Master1'.G1022>, '=PROPER(TRIM(E1022))&","&PROPER(TRIM(F1022))', 'f')
(<Cell u'Master1'.G1023>, '=PROPER(TRIM(E1023))&","&PROPER(TRIM(F1023))', 'f')
(<Cell u'Master1'.G1024>, '=PROPER(TRIM(E1024))&","&PROPER(TRIM(F1024))', 'f')
(<Cell u'Master1'.G1025>, '=PROPER(TRIM(E1025))&","&PROPER(TRIM(F1025))', 'f')
(<Cell u'Master1'.G1026>, '=PROPER(TRIM(E1026))&","&PROPER(TRIM(F1026))', 'f')
(<Cell u'Master1'.G1027>, '=PROPER(TRIM(E1027))&","&PROPER(TRIM(F1027))', 'f')
(<Cell u'Master1'.G1028>, '=PROPER(TRIM(E1028))&","&PROPER(TRIM(F1028))', 'f')
(<Cell u'Master1'.G1029>, '=PROPER(TRIM(E1029))&","&PROPER(TRIM(F1029))', 'f')
(<Cell u'Master1'.G1030>, '=PROPER(TRIM(E1030))&","&PROPER(TRIM(F1030))', 'f')
(<Cell u'Master1'.G1031>, '=PROPER(TRIM(E1031))&","&PROPER(TRIM(F1031))', 'f')
(<Cell u'Master1'.G1032>, '=PROPER(TRIM(E1032))&","&PROPER(TRIM(F1032))', 'f')
(<Cell u'Master1'.G1033>, '=PROPER(TRIM(E1033))&","&PROPER(TRIM(F1033))', 'f')
(<Cell u'Master1'.G1034>, '=PROPER(TRIM(E1034))&","&PROPER(TRIM(F1034))', 'f')
(<Cell u'Master1'.G1035>, '=PROPER(TRIM(E1035))&","&PROPER(TRIM(F1035))', 'f')
(<Cell u'Master1'.G1036>, '=PROPER(TRIM(E1036))&","&PROPER(TRIM(F1036))', 'f')
(<Cell u'Master1'.G1037>, '=PROPER(TRIM(E1037))&","&PROPER(TRIM(F1037))', 'f')
(<Cell u'Master1'.G1038>, '=PROPER(TRIM(E1038))&","&PROPER(TRIM(F1038))', 'f')
(<Cell u'Master1'.G1039>, '=PROPER(TRIM(E1039))&","&PROPER(TRIM(F1039))', 'f')
(<Cell u'Master1'.G1040>, '=PROPER(TRIM(E1040))&","&PROPER(TRIM(F1040))', 'f')
(<Cell u'Master1'.G1041>, None, 'n')
(<Cell u'Master1'.G1042>, '=PROPER(TRIM(E1042))&","&PROPER(TRIM(F1042))', 'f')
(<Cell u'Master1'.G1043>, '=PROPER(TRIM(E1043))&","&PROPER(TRIM(F1043))', 'f')


Kyler Laird

unread,
Jun 22, 2018, 5:55:21 PM6/22/18
to openpyx...@googlegroups.com
Here's a more succinct demonstration:

$ cat -n test.py && python test.py
     1    import openpyxl
     2   
     3    print(openpyxl.__version__)
     4   
     5    for ro in [True, False]:
     6        wb = openpyxl.load_workbook(filename='soeasa.xlsx', read_only=ro)
     7        ws = wb['Master1']
     8   
     9        print(ro, ws.cell(row=99, column=7).value)
2.5.4
(True, '=')
/tmp/openpyxl/worksheet/header_footer.py:49: UserWarning: Cannot parse header or footer so it will be ignored
  warn("""Cannot parse header or footer so it will be ignored""")
(False, '=PROPER(TRIM(E99))&","&PROPER(TRIM(F99))')


--
You received this message because you are subscribed to a topic in the Google Groups "openpyxl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpyxl-users/gNKlaAPKQS0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpyxl-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kyler Laird

unread,
Jun 24, 2018, 12:43:25 PM6/24/18
to openpyx...@googlegroups.com
I worked around this problem by loading with data_only=True.  It's not ideal but it will suffice for now.

Charlie Clark

unread,
Jul 2, 2018, 5:31:19 AM7/2/18
to openpyx...@googlegroups.com
Am .06.2018, 18:43 Uhr, schrieb Kyler Laird <kyler...@gmail.com>:

> I worked around this problem by loading with data_only=True. It's not
> ideal but it will suffice for now.

It looks like the problem is with Excel's shared formulae which means that
one cell actually has a formula and other cells refer to it. In
standard-mode these get unpacked and the references are corrected; in
read-only mode this isn't possible because you might be only reading a
small section of a worksheet and one that doesn't have the base formula.

If you know what you're doing you could write your own formula translator
for your worksheets based on the standard-mode one. It shouldn't be too
hard and could then be included in a future version via a PR. I guess
allowing ad-hoc access to continue to return empty formula if the base
formula isn't in the selected range. Improvements to docs are always
welcome.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
Reply all
Reply to author
Forward
0 new messages