trouble with sheet.columns index

1,277 views
Skip to first unread message

msch...@mrn.org

unread,
May 4, 2016, 2:27:01 PM5/4/16
to openpyxl-users
Hi all,
I'm new to both python and openpyxl.  I'm not sure what version of openpyxl I'm working with, but I get "Requirement already up-to-date" when I pip install --upgrade openpyxl.

I suspect I'm running into a difference of indices for sheet.cell(row = x, column = y) and sheet.columns[y]...?  I can step through the cells, row by row, column by column, and query the values looking for the column header of the columns I want.  When I try to get the 2nd column, which is also the max column, I get IndexError: tuple index out of range.  Work shown below.

Thanks for any suggestions!
Megan

## Successful column grab
In [7]: sheet.cell(row=1, column = 20)
Out[7]: <Cell Sheet1.T1>

In [8]: sheet.cell(row=1, column = 20).value
Out[8]: 'MEG Events Formated'

In [9]: sheet.columns[20]
Out[9]:
(<Cell Sheet1.U1>,
 <Cell Sheet1.U2>,
 <Cell Sheet1.U3>,
...


## Unsuccessful column grab, I get the same result if I use sheet.max_column or 21, which is the sheet.max_column for this spreadsheet.
In [3]: sheet.cell(row = 2, column = sheet.max_column)
Out[3]: <Cell Sheet1.U2>

In [4]: sheet.cell(row = 2, column = sheet.max_column).value
Out[4]: 'Time Offset From EEG to MEG (In S)'

In [5]: sheet.columns[sheet.max_column]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-d833ed7620d4> in <module>()
----> 1 sheet.columns[sheet.max_column]

IndexError: tuple index out of range




Charlie Clark

unread,
May 4, 2016, 2:40:38 PM5/4/16
to openpyx...@googlegroups.com
Am .05.2016, 20:27 Uhr, schrieb <msch...@mrn.org>:

> In [5]: sheet.columns[sheet.max_column]

In openpyxl we used 1-based indexing for cell addresses so that the row
number is constant: ws.cell(row=1, column=1) == ws['A1'] but only in this
situations. ws.columns and ws.rows are standard Python tuples and are thus
0-indexed so ws.columns[max_column] must always be an index error.

In general, however, you should avoid using ws.max_row and ws.max_column
in your own code. In your case, ws.columns[-1] is what you want.

Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

msch...@mrn.org

unread,
May 4, 2016, 3:12:12 PM5/4/16
to openpyxl-users
Hi Charlie, Thanks for your help!

It turns out that my "successful" column grab also fails, as I was just investigating its contents and finding it also had the wrong index the way I was doing it.   I understand now the difference between cell indexing and the other tuples' indices.

How would you suggest searching over the whole spreadsheet, as I look for my needed columns?  Here is what I have, and I'm not sure how to use the sheet.columns[-1] syntax to build a range:

for column in range(1, sheet.max_column+1 ):  
     for row in range(1,  5):  # headers are only in the first few rows
          matchObj = re.search('target header', str(sheet.cell(row= row, column= column).value))
          if matchObj :
               target_column = column
#  I would change sheet.columns[target_column] to
useful_column = sheet.columns[target_column - 1]

With much appreciation,
Megan

Charlie Clark

unread,
May 4, 2016, 3:31:35 PM5/4/16
to openpyx...@googlegroups.com
Am .05.2016, 21:12 Uhr, schrieb <msch...@mrn.org>:

Hiya Megan,

> Hi Charlie, Thanks for your help!
>
> It turns out that my "successful" column grab also fails, as I was just
> investigating its contents and finding it also had the wrong index the
> way I was doing it. I understand now the difference between cell
> indexing and the other tuples' indices.
>
> How would you suggest searching over the whole spreadsheet, as I look for
> my needed columns? Here is what I have, and I'm not sure how to use the
> sheet.columns[-1] syntax to build a range:

Is this a continuation of your Stackoverflow question?

The "best", or at least the most flexible way to search an Excel sheet is
probably to build a dictionary of the values, though this may be
unsuitable for large files.

from collections import defaultdict

words = defaultdict(list)

for row in ws.iter_rows():
for cell in row:
if cell.data_type == 's':
words[cell.value].append((cell.row, cell.column)) # the numerical
col_idx is also available or you could just enumerate()

So, say you have "cat" in B2 and E45:

words["cat"] == [(2, B), (45, E)] # I hope

> for column in range(1, sheet.max_column+1 ):
> for row in range(1, 5): # headers are only in the first few rows
> matchObj = re.search('target header', str(sheet.cell(row= row,
> column= column).value))
> if matchObj :
> target_column = column
> # I would change sheet.columns[target_column] to
> useful_column = sheet.columns[target_column - 1]

You should try and avoid creating your own ranges and use properties
provided as it's much easier to read and your using tested™ code.

I'm not quite sure what you're looking for but the following should help
you get along

for row in ws.iter_rows('A1', row_offset=4, column_offset=ws.max_column-1):
for cell in row:
m = re.search(regex, cell.value)
if m:
return ws.columns[cell.col_idx - 1]

Version 2.4 improves the handling here a bit by letting you pass in
min_row, min_column, max_row and max_column and also gives you
ws.iter_cols() to work with so that if you are working with columns you
can break out of the inner loop.

Hope that helps a bit.

msch...@mrn.org

unread,
May 5, 2016, 1:13:04 PM5/5/16
to openpyxl-users
Hi again,
Thanks for your help, once again!  Yes, this is all a continuation of the same question about indexing.  I am working through the google openpyxl-users group, which shows it as all one thread.  I will take another look at your suggestion for using a dictionary.

I am having trouble with your suggested use of ws.iter_rows.  It seems it is only stepping over one cell?  Can you make a suggestions of where I'm going wrong?

# If I do...
sheet.dimensions
# Output is:
'A1:U424'

# If I do...
for row in sheet.iter_rows('A1', row_offset =4, column_offset = sheet.max_column - 1):
        print('we are working in row:', row)
        for cell in row:
                print('we are working in cell:', cell)
                print(cell.value)
                print(cell.coordinate)
# The output is
we are working in row: (<Cell Sheet1.U5>,)
we are working in cell: <Cell Sheet1.U5>
27.950000000000003
U5

Sincere thanks,
Megan

Charlie Clark

unread,
May 5, 2016, 1:28:03 PM5/5/16
to openpyx...@googlegroups.com
Am .05.2016, 19:13 Uhr, schrieb <msch...@mrn.org>:

> Hi again,
> Thanks for your help, once again! Yes, this is all a continuation of the
> same question about indexing. I am working through the google
> openpyxl-users group, which shows it as all one thread. I will take
> another look at your suggestion for using a dictionary.

That's the generic approach if you want to do several searches.

> I am having trouble with your suggested use of ws.iter_rows. It seems it
> is only stepping over one cell? Can you make a suggestions of where I'm
> going wrong?
>
> # If I do...
> sheet.dimensions
> # Output is:
> 'A1:U424'
>
> # If I do...
> for row in sheet.iter_rows('A1', row_offset =4, column_offset =
> sheet.max_column - 1):
> print('we are working in row:', row)
> for cell in row:
> print('we are working in cell:', cell)
> print(cell.value)
> print(cell.coordinate)
> # The output is
> we are working in row: (<Cell Sheet1.U5>,)
> we are working in cell: <Cell Sheet1.U5>
> 27.950000000000003
> U5

That doesn't look right. You might actually be best using
ws.get_squared_range() which is what iter_rows() calls internally and only
works with numerical values:

rows = ws.get_squared_range(min_col=1, min_row=1, max_col=ws.max_column-1,
max_row=5)

I keep forgetting this method exists which is why iter_rows() and
iter_cols() get the same signature in 2.4 and I can deprecate it later.

2.4 is available via pip install -U --pre openpyxl
Reply all
Reply to author
Forward
0 new messages