iter_rows range_string?

1,373 views
Skip to first unread message

Anthony DeBarros

unread,
Nov 28, 2012, 10:27:27 AM11/28/12
to openpyx...@googlegroups.com
I have a spreadsheet with two columns, A and B.

I'm iterating through it like this:

wb = load_workbook(filename = 'Counties.xlsx', use_iterators=True)
ws = wb.get_sheet_by_name(name = 'COUNTIES')

for row in ws.iter_rows(range_string='A2:C3'):
    print row

My question is that if I use a range_string value of 'A2:B3', the object returned only includes column A. I have to expand the range to include C if I want to get the B column returned. Is that normal behavior or am I missing something?

Thanks.

Adam Morris

unread,
Nov 29, 2012, 6:22:44 AM11/29/12
to openpyx...@googlegroups.com
I haven't used the iter_worksheet - but I believe it's still quite new, sounds like a bug.

Are you working with a large excel file, where using a normal workbook is too slow?  If you're using openpyxl without the use_iterators(), then it opens the normal worksheet, and you can loop through a range correctly as:

for row in ws.range('A2:C3'):
   print row

Range isn't implemented yet for iter_worksheet.

--Adam

Anthony DeBarros

unread,
Nov 29, 2012, 8:34:16 AM11/29/12
to openpyx...@googlegroups.com
Right, if I use:

for row in ws.range('A2:B3'):
    print row

then the range matches up to the two columns -- A and B -- in the worksheet. But if I use ws.iter_rows, then I have to expand the range_string to include column C even though there are no data there.

Charlie Clark

unread,
Nov 29, 2012, 3:05:57 PM11/29/12
to openpyx...@googlegroups.com
Am 29.11.2012, 14:34 Uhr, schrieb Anthony DeBarros <adeb...@gmail.com>:

> Right, if I use:
> for row in ws.range('A2:B3'):
> print row
> then the range matches up to the two columns -- A and B -- in the
> worksheet. But if I use ws.iter_rows, then I have to expand the
> range_string to include column C even though there are no data there.

You could quite easily have hit a bug. To be fair "iter_worksheet.py"
notes that it's still a bit rough.

Looking at the internals there are two completely different
implementations of "range". Worksheets have a built-in method with a
fairly long way of calculation the boundaries and iter_worksheet comes
with a much more compact function. It's a bit difficult to compare the two
directly - you certainly can't just map the test - but look at the
following for when you pass a range in:

standard implementation

if ':' in range_string:
# R1C1 range
result = []
min_range, max_range = range_string.split(':')
min_col, min_row = coordinate_from_string(min_range)
max_col, max_row = coordinate_from_string(max_range)
if column:
min_col = get_column_letter(
column_index_from_string(min_col) + column)
max_col = get_column_letter(
column_index_from_string(max_col) + column)
min_col = column_index_from_string(min_col)
max_col = column_index_from_string(max_col)
cache_cols = {}
for col in xrange(min_col, max_col + 1):
cache_cols[col] = get_column_letter(col)
rows = xrange(min_row + row, max_row + row + 1)
cols = xrange(min_col, max_col + 1)
for row in rows:
new_row = []
for col in cols:
new_row.append(self.cell('%s%s' % (cache_cols[col],
row)))
result.append(tuple(new_row))
return tuple(result)

And iter_worksheet

if ':' in range_string:
min_range, max_range = range_string.split(':')
min_col, min_row = coordinate_from_string(min_range)
max_col, max_row = coordinate_from_string(max_range)

min_col = column_index_from_string(min_col) + column
max_col = column_index_from_string(max_col) + column
min_row += row
max_row += row


I think the addition of +1 to the rows and cols is responsible for the
discrepancy in which case this should be filed as a bug.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
Reply all
Reply to author
Forward
0 new messages