load_workbook with read_only=True doesn't behave the same after 2.4.0 release

1,802 views
Skip to first unread message

tim.d...@gmail.com

unread,
Jan 1, 2017, 6:53:05 PM1/1/17
to openpyxl-users
Hi,

I'm not sure if this is a bug or the documentation is out of date. I'm in the process of up'ing a codebase from using version 2.3.5 to 2.4.1 and came across this differing behaviour. Consider the following trivial sample code:

(ve)$ cat /tmp/test.py
fro
m openpyxl import Workbook
from openpyxl.reader.excel import load_workbook

wb = Workbook(write_only=True)
ws = wb.create_sheet()
ws.append(('LAST NAME', 'FIRST NAME', 'Email address'))
ws.append(('John', 'Smith', 'jo...@gmail.com'))
ws.append(('Jane', 'Doe', 'ja...@gmail.com'))
ws.append(('Harry', 'Potter', 'ha...@gmail.com'))
wb.save('/tmp/test.xlsx')

wb = load_workbook(filename='/tmp/test.xlsx', read_only=True)
assert wb.get_sheet_names() == ['Sheet'], wb.get_sheet_names()
ws = wb['Sheet']
print (ws.max_column, ws.max_row, ws.min_column, ws.min_row)
ncells = 0
for row in ws:
    cells = tuple(row)
    ncells += len(cells)
    print cells
assert ncells == 12, ncells

 When run with v2.3.5, this code behaves as I'd expect it to:

(ve)$ pip install openpyxl==2.3.5
...
(ve)$ python /tmp/test.py
(None, None, 1, 1)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b496b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49770>)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b497d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49890>)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b496b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49770>)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b497d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49890>)

When run with v2.4.1 (or v2.4.0), this code fails to yield any cells:

(ve)$ pip install openpyxl==2.4.1
...
(ve)$ ./run.sh /tmp/test.py
(None, None, 1, 1)
Traceback (most recent call last):
  File "/tmp/test.py", line 21, in <module>
    assert ncells == 12, ncells
AssertionError: 0

Has some part of this API changed that I can't seem to find the documentation for, or is this a bug? If it's a fail on my behalf, can someone please point me at the documentation for the change in behaviour here?

Cheers,
Tim

tim.d...@gmail.com

unread,
Jan 1, 2017, 7:06:53 PM1/1/17
to openpyxl-users, tim.d...@gmail.com
I forgot to mention, removing read_only=True when loading the workbook (wb = load_workbook(filename='/tmp/test.xlsx')) causes the correct behaviour on both v2.3.5 and v2.4.1:

(ve)$ pip install openpyxl==2.3.5
...
(ve)$ python /tmp/test.py
(3, 4, 1, 1)
(<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>)
(<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>)
(<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>)
(<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>)
(ve)$ pip install openpyxl==2.4.1
...
(ve)$ python /tmp/test.py
(3, 4, 1, 1)
(<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>)
(<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>)
(<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>)
(<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>)
(ve)$ 

Cheers,
Tim

Charlie Clark

unread,
Jan 2, 2017, 10:09:24 AM1/2/17
to openpyx...@googlegroups.com
Am .01.2017, 00:53 Uhr, schrieb <tim.d...@gmail.com>:

> Has some part of this API changed that I can't seem to find the
> documentation for, or is this a bug? If it's a fail on my behalf, can
> someone please point me at the documentation for the change in behaviour
> here?

The API hasn't really changed but we removed an optional optimisation when
writing worksheets. The specification allows writers to store the
dimensions of a worksheet at the start of the file. Unfortunately, this
makes no sense when streaming to a file as in write-only mode. This is
obviously a design flaw but there it is.

When you read a file in standard mode it reads all cells into memory and
so it always knows how big the worksheet is. In read-only mode, however,
cells are read in on-demand and if the dimensions are not available we
have no idea how big a worksheet is without reading it all at least once.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Tim Dawborn

unread,
Jan 2, 2017, 5:15:05 PM1/2/17
to openpyx...@googlegroups.com
So to clarify: you're stating that it's expected behaviour that a Workbook(write_only=True) when saved and then loaded by load_workbook(read_only=True) will not work? That seems like a pretty nasty gotcha to me. If that's the case, perhaps it's worth adding something to the docs?

Given that, I have altered my example program to remove write_only=True yet it still does not work in v2.4.x but does work in v2.3.5. Note that no cells are yielded via iteration when reading:

(ve)$ cat /tmp/test.py
from openpyxl import Workbook
from openpyxl.reader.excel import load_workbook

wb = Workbook()
ws = wb.active
ws.append(('LAST NAME', 'FIRST NAME', 'Email address'))
ws.append(('John', 'Smith', 'jo...@gmail.com'))
ws.append(('Jane', 'Doe', 'ja...@gmail.com'))
ws.append(('Harry', 'Potter', 'ha...@gmail.com'))
wb.save('/tmp/test.xlsx')

wb = load_workbook(filename='/tmp/test.xlsx', read_only=True)
assert wb.get_sheet_names() == ['Sheet'], wb.get_sheet_names()
ws = wb['Sheet']
print(ws.max_column, ws.max_row, ws.min_column, ws.min_row)
ncells = 0
for row in ws:
    cells = tuple(row)
    ncells += len(cells)
    print(cells)
assert ncells == 12, ncells
(ve)$ pip install openpyxl==2.3.5
...
(ve)$ python /tmp/test.py
(3, 4, 1, 1)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae66b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6770>)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae67d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6890>)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae66b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6770>)
(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae67d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6890>)
(ve)$ pip install openpyxl==2.4.1
...
(ve)$ python /tmp/test.py
(3, 4, 1, 1)
Traceback (most recent call last):
  File "/tmp/test.py", line 21, in <module>
    assert ncells == 12, ncells
AssertionError: 0
(ve)$

Can you shed any further light as to why this is the case?

Thanks!




--
You received this message because you are subscribed to a topic in the Google Groups "openpyxl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpyxl-users/GN1OKVHW8ec/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpyxl-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Charlie Clark

unread,
Jan 4, 2017, 4:19:33 AM1/4/17
to openpyx...@googlegroups.com
Am .01.2017, 23:14 Uhr, schrieb Tim Dawborn <tim.d...@gmail.com>:

> So to clarify: you're stating that it's expected behaviour that a
> Workbook(write_only=True) when saved and then loaded by
> load_workbook(read_only=True) will not work?

No, not at all. Just that worksheets created in write-only mode are
"unsized" because they do not contain a dimensions element so you cannot
use max_row and max_column with them.

> That seems like a pretty nasty
> gotcha to me. If that's the case, perhaps it's worth adding something to
> the docs?
>
> Given that, I have altered my example program to remove write_only=True
> yet
> it still does not work in v2.4.x but does work in v2.3.5. Note that *no
> cells* are yielded via iteration when reading:
Sorry, I don't think I really understand the problem. But this may simply
be because you're extrapolating from a false premise.

Tim Dawborn

unread,
Jan 14, 2017, 12:03:30 AM1/14/17
to openpyx...@googlegroups.com
On 4 January 2017 at 20:19, Charlie Clark <charli...@clark-consulting.eu> wrote:
Am .01.2017, 23:14 Uhr, schrieb Tim Dawborn <tim.d...@gmail.com>:

So to clarify: you're stating that it's expected behaviour that a
Workbook(write_only=True) when saved and then loaded by
load_workbook(read_only=True) will not work?

No, not at all. Just that worksheets created in write-only mode are "unsized" because they do not contain a dimensions element so you cannot use max_row and max_column with them.

Okay, in that case, can you please provide me some working example code that writes a workbook to a file in write_only mode and then loads up that same workbook in read_only mode, and prints out the contents of each cell in the workbook? That is what I believed my original sample code was doing, but as my example outputs show, does not actually happen in release >= 2.4.0.

goo...@wkinet.de

unread,
Jan 31, 2017, 12:47:32 PM1/31/17
to openpyxl-users, tim.d...@gmail.com
Am Samstag, 14. Januar 2017 06:03:30 UTC+1 schrieb Tim Dawborn:

Okay, in that case, can you please provide me some working example code that writes a workbook to a file in write_only mode and then loads up that same workbook in read_only mode, and prints out the contents of each cell in the workbook? That is what I believed my original sample code was doing, but as my example outputs show, does not actually happen in release >= 2.4.0.

Hi Tim,
the Class Inheritance has changed from 2.3x to 2.4.x
 -class ReadOnlyWorksheet(Worksheet):
 
+class ReadOnlyWorksheet(object):
I assume, this will change iteration behavior.

Change the following to get a working example:
  - for row in ws:
 
+ for row in ws.rows:


If you need
  ws.max_column, ws.max_row
you can force calculation with:
  ws.calculate_dimension(force=True)

Cheers.


Tim Dawborn

unread,
Feb 1, 2017, 1:46:41 AM2/1/17
to openpyx...@googlegroups.com
On 1 February 2017 at 04:47, <goo...@wkinet.de> wrote:
Am Samstag, 14. Januar 2017 06:03:30 UTC+1 schrieb Tim Dawborn:

Okay, in that case, can you please provide me some working example code that writes a workbook to a file in write_only mode and then loads up that same workbook in read_only mode, and prints out the contents of each cell in the workbook? That is what I believed my original sample code was doing, but as my example outputs show, does not actually happen in release >= 2.4.0.
the Class Inheritance has changed from 2.3x to 2.4.x
 -class ReadOnlyWorksheet(Worksheet):
 
+class ReadOnlyWorksheet(object):
I assume, this will change iteration behavior.

Change the following to get a working example:
  - for row in ws:
 
+ for row in ws.rows:



Ah ha. That was exactly the regression in openpyxl that occurred between 2.3.x and 2.4.x that I couldn't find. Thank you! 

Reply all
Reply to author
Forward
0 new messages