load_workbook with read_only=True doesn't behave the same after 2.4.0 release

tim.d...@gmail.com

unread,

Jan 1, 2017, 6:53:05 PM1/1/17

to openpyxl-users

Hi,

I'm not sure if this is a bug or the documentation is out of date. I'm in the process of up'ing a codebase from using version 2.3.5 to 2.4.1 and came across this differing behaviour. Consider the following trivial sample code:

(ve)$ cat /tmp/test.py

from openpyxl import Workbook
from openpyxl.reader.excel import load_workbook

wb = Workbook(write_only=True)
ws = wb.create_sheet()
ws.append(('LAST NAME', 'FIRST NAME', 'Email address'))
ws.append(('John', 'Smith', 'jo...@gmail.com'))
ws.append(('Jane', 'Doe', 'ja...@gmail.com'))
ws.append(('Harry', 'Potter', 'ha...@gmail.com'))
wb.save('/tmp/test.xlsx')

wb = load_workbook(filename='/tmp/test.xlsx', read_only=True)
assert wb.get_sheet_names() == ['Sheet'], wb.get_sheet_names()
ws = wb['Sheet']
print (ws.max_column, ws.max_row, ws.min_column, ws.min_row)
ncells = 0
for row in ws:
cells = tuple(row)
ncells += len(cells)
print cells

assert ncells == 12, ncells

When run with v2.3.5, this code behaves as I'd expect it to:

(ve)$ pip install openpyxl==2.3.5

...

(ve)$ python /tmp/test.py

(None, None, 1, 1)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b496b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49770>)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b497d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49890>)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b496b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49770>)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b497d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f2f48b49890>)

When run with v2.4.1 (or v2.4.0), this code fails to yield any cells:

(ve)$ pip install openpyxl==2.4.1

...

(ve)$ ./run.sh /tmp/test.py

(None, None, 1, 1)

Traceback (most recent call last):

File "/tmp/test.py", line 21, in <module>

assert ncells == 12, ncells

AssertionError: 0

Has some part of this API changed that I can't seem to find the documentation for, or is this a bug? If it's a fail on my behalf, can someone please point me at the documentation for the change in behaviour here?

Cheers,

Tim

tim.d...@gmail.com

unread,

Jan 1, 2017, 7:06:53 PM1/1/17

to openpyxl-users, tim.d...@gmail.com

I forgot to mention, removing read_only=True when loading the workbook (wb = load_workbook(filename='/tmp/test.xlsx')) causes the correct behaviour on both v2.3.5 and v2.4.1:

(ve)$ pip install openpyxl==2.3.5

...

(ve)$ python /tmp/test.py

(3, 4, 1, 1)

(<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>)

(<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>)

(<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>)

(<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>)

(ve)$ pip install openpyxl==2.4.1

...

(ve)$ python /tmp/test.py

(3, 4, 1, 1)

(<Cell Sheet.A1>, <Cell Sheet.B1>, <Cell Sheet.C1>)

(<Cell Sheet.A2>, <Cell Sheet.B2>, <Cell Sheet.C2>)

(<Cell Sheet.A3>, <Cell Sheet.B3>, <Cell Sheet.C3>)

(<Cell Sheet.A4>, <Cell Sheet.B4>, <Cell Sheet.C4>)

(ve)$

Cheers,

Tim

Charlie Clark

unread,

Jan 2, 2017, 10:09:24 AM1/2/17

to openpyx...@googlegroups.com

Am .01.2017, 00:53 Uhr, schrieb <tim.d...@gmail.com>:

> Has some part of this API changed that I can't seem to find the
> documentation for, or is this a bug? If it's a fail on my behalf, can
> someone please point me at the documentation for the change in behaviour
> here?

The API hasn't really changed but we removed an optional optimisation when
writing worksheets. The specification allows writers to store the
dimensions of a worksheet at the start of the file. Unfortunately, this
makes no sense when streaming to a file as in write-only mode. This is
obviously a design flaw but there it is.

When you read a file in standard mode it reads all cells into memory and
so it always knows how big the worksheet is. In read-only mode, however,
cells are read in on-demand and if the dimensions are not available we
have no idea how big a worksheet is without reading it all at least once.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

Tim Dawborn

unread,

Jan 2, 2017, 5:15:05 PM1/2/17

to openpyx...@googlegroups.com

So to clarify: you're stating that it's expected behaviour that a Workbook(write_only=True) when saved and then loaded by load_workbook(read_only=True) will not work? That seems like a pretty nasty gotcha to me. If that's the case, perhaps it's worth adding something to the docs?

Given that, I have altered my example program to remove write_only=True yet it still does not work in v2.4.x but does work in v2.3.5. Note that no cells are yielded via iteration when reading:

(ve)$ cat /tmp/test.py

from openpyxl import Workbook

from openpyxl.reader.excel import load_workbook

wb = Workbook()

ws = wb.active

ws.append(('LAST NAME', 'FIRST NAME', 'Email address'))

ws.append(('John', 'Smith', 'jo...@gmail.com'))

ws.append(('Jane', 'Doe', 'ja...@gmail.com'))

ws.append(('Harry', 'Potter', 'ha...@gmail.com'))

wb.save('/tmp/test.xlsx')

wb = load_workbook(filename='/tmp/test.xlsx', read_only=True)

assert wb.get_sheet_names() == ['Sheet'], wb.get_sheet_names()

ws = wb['Sheet']

print(ws.max_column, ws.max_row, ws.min_column, ws.min_row)

ncells = 0

for row in ws:

cells = tuple(row)

ncells += len(cells)

print(cells)

assert ncells == 12, ncells

(ve)$ pip install openpyxl==2.3.5

...

(ve)$ python /tmp/test.py

(3, 4, 1, 1)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae66b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6770>)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae67d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6890>)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae66b0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6710>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6770>)

(<openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae67d0>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6830>, <openpyxl.cell.read_only.ReadOnlyCell object at 0x7f8ef2ae6890>)

(ve)$ pip install openpyxl==2.4.1

...

(ve)$ python /tmp/test.py

(3, 4, 1, 1)

Traceback (most recent call last):

File "/tmp/test.py", line 21, in <module>

assert ncells == 12, ncells

AssertionError: 0

(ve)$

Can you shed any further light as to why this is the case?

Thanks!

--
You received this message because you are subscribed to a topic in the Google Groups "openpyxl-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpyxl-users/GN1OKVHW8ec/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpyxl-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Charlie Clark

unread,

Jan 4, 2017, 4:19:33 AM1/4/17

to openpyx...@googlegroups.com

Am .01.2017, 23:14 Uhr, schrieb Tim Dawborn <tim.d...@gmail.com>:

> So to clarify: you're stating that it's expected behaviour that a
> Workbook(write_only=True) when saved and then loaded by
> load_workbook(read_only=True) will not work?

No, not at all. Just that worksheets created in write-only mode are
"unsized" because they do not contain a dimensions element so you cannot
use max_row and max_column with them.

> That seems like a pretty nasty
> gotcha to me. If that's the case, perhaps it's worth adding something to
> the docs?
>
> Given that, I have altered my example program to remove write_only=True
> yet

> it still does not work in v2.4.x but does work in v2.3.5. Note that *no
> cells* are yielded via iteration when reading:

Sorry, I don't think I really understand the problem. But this may simply
be because you're extrapolating from a false premise.

Tim Dawborn

unread,

Jan 14, 2017, 12:03:30 AM1/14/17

to openpyx...@googlegroups.com

On 4 January 2017 at 20:19, Charlie Clark <charli...@clark-consulting.eu> wrote:

Am .01.2017, 23:14 Uhr, schrieb Tim Dawborn <tim.d...@gmail.com>:

So to clarify: you're stating that it's expected behaviour that a
Workbook(write_only=True) when saved and then loaded by
load_workbook(read_only=True) will not work?

No, not at all. Just that worksheets created in write-only mode are "unsized" because they do not contain a dimensions element so you cannot use max_row and max_column with them.

Okay, in that case, can you please provide me some working example code that writes a workbook to a file in write_only mode and then loads up that same workbook in read_only mode, and prints out the contents of each cell in the workbook? That is what I believed my original sample code was doing, but as my example outputs show, does not actually happen in release >= 2.4.0.

goo...@wkinet.de

unread,

Jan 31, 2017, 12:47:32 PM1/31/17

to openpyxl-users, tim.d...@gmail.com

Am Samstag, 14. Januar 2017 06:03:30 UTC+1 schrieb Tim Dawborn:

Okay, in that case, can you please provide me some working example code that writes a workbook to a file in write_only mode and then loads up that same workbook in read_only mode, and prints out the contents of each cell in the workbook? That is what I believed my original sample code was doing, but as my example outputs show, does not actually happen in release >= 2.4.0.

Hi Tim,
the Class Inheritance has changed from 2.3x to 2.4.x

 -class ReadOnlyWorksheet(Worksheet): 
 +class ReadOnlyWorksheet(object):

I assume, this will change iteration behavior.

Change the following to get a working example:

  - for row in ws:
  + for row in ws.rows:

If you need

ws.max_column, ws.max_row

you can force calculation with:

ws.calculate_dimension(force=True)

Cheers.

Tim Dawborn

unread,

Feb 1, 2017, 1:46:41 AM2/1/17

to openpyx...@googlegroups.com

On 1 February 2017 at 04:47, <goo...@wkinet.de> wrote:

Am Samstag, 14. Januar 2017 06:03:30 UTC+1 schrieb Tim Dawborn:

Okay, in that case, can you please provide me some working example code that writes a workbook to a file in write_only mode and then loads up that same workbook in read_only mode, and prints out the contents of each cell in the workbook? That is what I believed my original sample code was doing, but as my example outputs show, does not actually happen in release >= 2.4.0.

the Class Inheritance has changed from 2.3x to 2.4.x
-class ReadOnlyWorksheet(Worksheet): +class ReadOnlyWorksheet(object):
I assume, this will change iteration behavior.
Change the following to get a working example:
- for row in ws: + for row in ws.rows:

Ah ha. That was exactly the regression in openpyxl that occurred between 2.3.x and 2.4.x that I couldn't find. Thank you!

Reply all

Reply to author

Forward